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Preface 



This volume contains the papers presented at the 6th International Workshop 
on Approximation Algorithms for Combinatorial Optimization Problems 
(APPROX 2003) and the 7th International Workshop on Randomization and 
Approximation Techniques in Computer Science (RANDOM 2003), which took 
place concurrently at Princeton University during August 24-26, 2003. APPROX 
focuses on algorithmic and complexity issues surrounding the development of ef- 
ficient approximate solutions to computationally hard problems, and this was the 
sixth in the series, after Aalborg (1998), Berkeley (1999), Saarbriicken (2000), 
Berkeley (2001), and Rome (2002). RANDOM is concerned with applications of 
randomness to computational and combinatorial problems, and this was the sev- 
enth workshop in the series, following Bologna (1997), Barcelona (1998), Berkeley 
(1999), Geneva (2000), Berkeley (2001), and Harvard (2002). 

Topics of interest for APPROX and RANDOM are: design and analysis of 
randomized algorithms, randomized complexity theory, design and analysis of 
approximation and online algorithms, complexity of approximation problems, 
random combinatorial structures, error-correcting codes, pseudorandomness and 
derandomization, network models and algorithms, average-case analysis, prop- 
ertytesting, expander graphs and randomness extractors, random walks, Markov 
chains, probabilistic proof systems, random projections and embeddings, com- 
putational learning, randomness in cryptography, and various applications. 

The volume contains 16-1-17 (APPROX H- RANDOM) contributed papers, 
selected by the two program committees from 40-1-34 submissions received in 
response to the call for papers. 

We would like to thank all of the authors who submitted papers, the members 
of the program committees 
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and the external subreferees; Dimitris Achlioptas, Andris Ambainis, Matthew 
Andrews, Aaron Archer, Nikhil Bansal, Luca Becchetti, Christian Borgs, Moses 
Gharikar, Shuchi Chawla, Bernard Chazelle, Joseph Cheriyan, Don Copper- 
smith, Artur Czumaj, Bhaskar Dasgupta, Nikhil Devanur, Adrian Dumitrescu, 
Martin Dyer, Leah Epstein, Eldar Fischer, Rosario Gennaro, Catherine Green- 
hill, Sudipto Guha, Shirley Halevy, Shlomo Hoory, Sandy Irani, Yuval Ishai, 
Mark Jerrum, Ryan Johnston, Ravi Kannan, Anna Karlin, Howard Karloff, 
Michal Karonski, Claire Kenyon, Sanjeev Khanna, Subhash Khot, Alexei Ki- 
taev, Michael Krivelevich, Amit Kumar, Vijay Kumar, Xiang-Yang Li, Vincenzo 
Liberatore, Laci Lovasz, Avner Magen, Mohammad Mahdian, Adam Meyerson, 
Micheal Mitzenmacher, Kousha Moaveni-Nejad, Michael Molloy, Cris Moore, 
Elchanan Mossel, Moni Naor, Ashwin Nayak, Gaia Nicosia, Andrew Odlyzko, 
Alessandro Panconesi, Christos Papadimitriou, Rene Peralta, Yuval Rabani, R. 
Ravi, Oded Regev, Yossi Richter, Adi Rosen, Alex Russell, Amin Saberi, Mo- 
hammad R. Salavatipour, Guido Schaefer, Rene Sitters, Angelika Steger, Kunal 
Talwar, Prasad Tetali, Luca Trevisan, Kasturi Varadarajan, Umesh Vazirani, 
Santosh Vempala, Jacques Verstraete, Anastasios Viglas, Eric Vigoda, Berthold 
Voecking, Peng-Jun Wan, Peter Winkler, and David Zuckerman. 
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Correlation Clustering with Partial Information 



Erik D. Demaine and Nicole Immorlica* 

Laboratory for Computer Science, MIT, Cambridge, MA 02139, USA. 
edemaine , nickleOtheory . Ics . mit . edu. 



Abstract. We consider the following general correlation- clustering prob- 
lem [1]: given a graph with real edge weights (both positive and nega- 
tive), partition the vertices into clusters to minimize the total absolute 
weight of cut positive edges and uncut negative edges. Thus, large posi- 
tive weights (representing strong correlations between endpoints) encour- 
age those endpoints to belong to a common cluster; large negative weights 
encourage the endpoints to belong to different clusters; and weights with 
small absolute value represent little information. In contrast to most 
clustering problems, correlation clustering specifies neither the desired 
number of clusters nor a distance threshold for clustering; both of these 
parameters are effectively chosen to be the best possible by the problem 
definition. 

Correlation clustering was introduced by Bansal, Blum, and Chawla [1], 
motivated by both document clustering and agnostic learning. They 
proved NP-hardness and gave constant-factor approximation algorithms 
for the special case in which the graph is complete (full information) and 
every edge has weight +1 or —1. We give an 0(log n)-approximation 
algorithm for the general case based on a linear-programming rounding 
and the “region- growing” technique. We also prove that this linear pro- 
gram has a gap of f?(logn), and therefore our approximation is tight 
under this approach. We also give an 0(r®)-approximation algorithm for 
Ar.r-minor-free graphs. On the other hand, we show that the problem is 
APX-hard, and any o(log n)-approximation would require improving the 
best approximation algorithms known for minimum multicut. 



1 Introduction 

Clustering objects into groups is a common task that arises in many applications 
such as data mining, web analysis, computational biology, facility location, data 
compression, marketing, machine learning, pattern recognition, and computer 
vision. Clustering algorithms for these and other objectives have been heavily 
investigated in the literature. For partial surveys, see e.g. [6, 11, 14, 15, 16, 18]. 

In a theoretical setting, the objects are usually viewed as points in either a 
metric space (typically finite) or a general distance matrix, or as vertices in a 
graph. Typical objectives include minimizing the maximum diameter of a clus- 
ter (fe-clustering) [8], minimizing the average distance between pairs of clustered 

* Research was supported in part by an NSF GRF. 

S. Arora et al. (Eds.): APPROX 2003-1- RANDOM 2003, LNCS 2764, pp. 1-13, 2003. 

© Springer- Verlag Berlin Heidelberg 2003 
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this is simply the standard minimum-cut problem. For fc = 2, Yannakakis et 
al. [21] gave a polynomial time algorithm. For k > 3, this problem was shown to 
be APX-hard by Dahlhaus et al. [4]. Currently, the best known approximation 
for this problem in general graphs is an 0(log k) approximation algorithm by 
Garg et al. [7]. For graphs excluding any minor, Tardos and Vazirani [19] 
use a lemma of Klein et al. [12] to provide an O(r^) approximation (and thus 
constant approximation for planar graphs). 

The only previous work on the correlation-clustering problem is that of 
Bansal et al. [1]. Their paper considers correlation clustering in a complete graph 
with all edges assigned weights from {-|-1, — 1}, representing that every pair of ob- 
jects has an estimate of either “similar” or “dissimilar” . They address two main 
objective functions, minimizing disagreements and maximizing agreements be- 
tween the input estimates and the output clustering. The decision versions of 
these two optimization problems are identical and shown to be NP-complete. 
For minimizing disagreements, they give a constant-factor approximation via a 
combinatorial algorithm. For maximizing agreements, they give a PTAS. Both 
algorithms assume the input graph is complete. However, in many applications, 
estimate information is incomplete. 

In this paper, we consider minimizing disagreements in general graphs and 
with arbitrary weights. We give an 0(log n)-approximation algorithm for general 
graphs and an 0(r^ )-approximation algorithm for graphs excluding the complete 
bipartite graph Kr,r as a minor (e.g., graphs embeddable in surfaces of genus 
0{r^)). Our O(logn) approximation is based on linear programming, rounding, 
and the “region growing” technique [13, 7]. Using ideas developed in Bejerano et 
al. [2], we are able to prove that this rounding technique yields a good approx- 
imation. We then use a lemma of Klein et al. [12] to extend our results to an 
O(r^) approximation for ATr^r-minor-free graphs [19, 2]. We further prove that 
the gap in the linear program can be l7(logn), and therefore our bounds are 
tight for any algorithm based on rounding this linear program. We also prove 
that our problem is as hard as the APX-hard problem minimum multicut [4], for 
which the current best approximation is O(logfc) for a certain parameter k [7]. 
Any o(log n)-approximation algorithm for our problem would require improving 
the state-of-the-art for approximating minimum multicut. 

Almost simultaneously, two groups of researchers independently obtained re- 
sults similar to this paper. Charikar et al. [3] and Emanuel and Fiat [5] both 
give 0(log n) approximations for the minimization version and approximation- 
preserving reductions from minimum multicut, as we do. In addition, Charikar et 
al. [3] improve the Bansal et al. [1] result for complete graphs and give a constant 
factor approximation for the maximization version in general graphs. Emanuel 
and Fiat [5] also prove the equivalence of this problem with the minimum- 
multicut problem. 

The rest of this paper is organized as follows. Section 2 formalizes the correla- 
tion-clustering problem, the objective of minimizing disagreements, and presents 
the linear-programming formulation. Section 3 demonstrates a rounding tech- 
nique that yields an 0(log n) approximation for this linear program in general 
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points (fc-clustering sum) [17], minimizing the maximum distance to a “centroid 
object” chosen for each cluster (fc-center) [8], minimizing the average distance 
to such a centroid object (fc-median) [10], minimizing the average squared dis- 
tance to an arbitrary centroid point (fc-means) [11], and maximizing the sum 
of distances between pairs of objects in different clusters (maximum fc-cut) [13]. 
These objectives interpret the distance between points as a measure of their 
dissimilarity: the larger the distance, the more dissimilar the objects. Another 
line of clustering algorithms interprets the distance between points as a measure 
of their similarity: the larger the distance, the more similar the objects. In this 
case, the typical objective is to find a fc-clustering that minimizes the sum of 
distances between pairs of objects in different clusters (minimum fc-cut) [13]. 
All of these clustering problems are parameterized by the desired number k of 
clusters. Without such a restriction, these clustering objective functions would 
be optimized when k = n (every object is in a separate cluster) or when k = 1 
(all objects belong to a single cluster). 

In the correlation- clustering problem, introduced by Bansal et al. [1], the 
underlying model is that objects can be truly categorized, and we are given 
probabilities about pairs of objects belonging to common categories. For exam- 
ple, the multiset of objects might consist of all authors of English literature, 
and two authors belong to the same category if they correspond to the same 
real person. This task would be easy if authors published papers consistently 
under the same name. However, some authors might publish under several dif- 
ferent names such as William Shakespeare, W. Shakespeare, Bill Shakespeare, 
Sir Francis Bacon, Edward de Vere, and Queen Elizabeth I. Given some confi- 
dence about the similarity and dissimilarity of the names, our goal is to cluster 
the objects to maximize the probability of correctness. As we consider both sim- 
ilarity and dissimilarity measures, our objective is in a sense a generalization of 
the typical clustering objectives mentioned above. In fact, an appropriate inter- 
pretation of our problem instance suggests that our objective is a combination 
of the minimum fc-clustering sum and minimum fc-cut clustering objectives. 

An interesting property of our problem is that the number k of clusters is no 
longer a parameter of the input; there is some “ideal” k which the algorithm must 
find. Another clustering problem with this property is location area design, a 
problem arising in cell phone network design. As formulated by Bejerano et al. [2] , 
this problem attempts to minimize the sum of the sizes squared of the clusters 
plus the weight of the cut induced by the clustering. The authors provide an 
0(log n) approximation for this problem in general graphs using region-growing 
techniques and an O(r^) approximation in ^-minor-free graphs using a lemma 
of Klein et al. [12]. The similarities between these two problems allow us to apply 
many of the same techniques. 

For our lower bounds, we exploit the similarities between the correlation- 
clustering problem and the minimum- multicut problem, introduced by Hu [9]. 
In the minimum-multicut problem, we are given an edge-weighted graph and 
a list of k pairs of vertices, and the goal is to remove edges of minimum total 
weight such that the resulting graph disconnects all k input pairs. For k = 1, 
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graphs. Section 4 considers the special case of ,,-niinor-free graphs and uses an 
alternate rounding technique to get an O(r^) approximation in these graphs. In 
Section 5, we prove lower bounds, establishing APX-hardness and a logarithmic 
gap in the linear program. We conclude with open problems in Section 6. 

2 Problem Definition and Linear-Programming 
Formulation 

An instance of the correlation-clustering problem is an undirected graph G = 
(V, E) with edge weights Ce G (— oo, -|-oo) for each e G E. Each edge weight can 
be interpreted as a confidence measure of the similarity or dissimilarity of the 
edge’s endpoints. For example, if there is a function /(u, v) that outputs the 
probability of u and v being similar, then a natural assignment of weight to edge 
e = (u,v) is Ce = log Hence, an edge e = (u,v) of weight Cg > 0 

corresponds to a belief that nodes u and v are similar. Larger Cg indicate higher 
confidence in this belief. Similarly, an edge weight of Cg < 0 suggests that u and 
V are dissimilar. An edge weight of Cg = 0 (or, equivalently, the lack of an edge 
between u and v), indicates no belief about the similarity of u and v. 

In this paper, our goal is to output a partition or clustering S = {51, . . . , Sk} 
of the vertices that minimizes disagreements. The disagreements or cost of a 
partition is the total weight of the “mistakes”, that is, the weight of positive 
edges between clusters and the absolute value of the weight of negative edges 
within clusters. In the case Cg G { — I,0,-|-I}, the cost of a partition is simply the 
number of cut positive edges plus uncut negative edges. Intuitively, this objec- 
tive penalizes the clustering whenever presumed similar objects are in different 
clusters and presumed dissimilar objects are in the same cluster. For the pur- 
poses of approximation algorithms, minimizing disagreements is different from 
maximizing agreements (the weight of cut negative edges plus uncut positive 
edges) . 

We introduce the following notation for the cost of a clustering: 
cost(S) = costp(S) -I- costm(S), 

costp(S) = ^ { |cg| : e={u,v)GE] Cg > 0; and Vi, |{u, u} n 5i| < l} , 
costm(S) = ^ { |cg| : e={u,v)GE; Cg < 0; and 3i, |{u, u} n 5i| = 2} . 

We will refer to the optimal clustering as OPT and its cost as cost(OPT). 

Previous approximation algorithms. Bansal et al. [1] give a constant factor ap- 
proximation for this problem in the special case of complete graphs with edge 
weights in { — 1,4-1}. Their algorithm is combinatorial. It iteratively “cleans” 
clusters until every cluster C is 5-clean (i.e. for every vertex v G C, v has at 
least (1 — (5)|G| plus neighbors in C and at most (5|G| plus neighbors outside G). 
They bound the approximation factor of their algorithm by counting the number 
of “bad” triangles (triangles with two 4-1 edges and one —1 edge) in a ^-clean 
clustering and use the existence of these bad triangles to lower bound OPT. 
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Complete graphs have many triangles, and the counting arguments for count- 
ing bad triangles rely heavily on this fact. When we generalize the problem to 
graphs that are not necessarily complete, bad triangles no longer form a good 
lower bound on OPT. It may be possible to find a combinatorial algorithm for 
this problem that bounds the approximation factor by counting bad cycles (cy- 
cles with exactly one minus edge). However, in this paper, we formulate the 
problem as a linear program, round it, and use its optimal solution to bound our 
approximation factor. 



Linear-programming formulation. Consider assigning a zero-one variable Xuv to 
each pair of vertices (hence Xuv = Xyu)- When (u,w) C E, we will sometimes 
write Xuv as Xe where it is understood that e = {u,v). Given a clustering, set 
Xuv = 0 if u and v are in a common cluster, and Xuv = 1 otherwise. To express 
cost(S) in this notation, notice that 1 — Xe is 1 if edge e is within a cluster and 
0 if edge e is between clusters. Define constants 

_ J \ce\ if Ce < 0, 

I 0 if Ce > 0, 



and 



Then 



Pe = 



|Ce| if Ce > 0, 
0 if Ce < 0. 



COSt(§) = ^ me(l - Xe) -f ^ PeXe- 
eeE eeE 

Our goal is to find a valid assignment of Xuv’s to minimize this cost. An assign- 
ment of Xuv’s is valid (corresponds to a clustering) if Xuv € {0, 1} and the Xw’s 
satisfy the triangle inequality. 

We relax this integer program to the following linear program: 



minimize E me(l - Xe) -f y^^PeXe 

eeE eeE 

subject to Xuv € [0,1] 

Xuv T Xyw ^ Xuw 
^UV — ^vu 



Because the solution set to this linear program contains the solution set to the 
integer program, the optimal solution to the linear program is a lower bound on 
the cost of the optimal clustering. 



3 Approximation in General Graphs 

We use the linear-programming formulation of this problem to design an approx- 
imation algorithm. The algorithm first solves the linear program. The resulting 
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fractional values are interpreted as distances between vertices; close vertices are 
most likely similar, far vertices are most likely different. The algorithm then uses 
region-growing techniques to group close vertices and thus round the fractional 
variables. Using ideas from Bejerano et al. [2], we are able to show that this 
approach yields an 0(log n) approximation. A modification to this approach, 
outlined in Section 4, will yield an O(r^) approximation for A'^.^r-minor-free 
graphs. 

Region growing. We iteratively grow balls of at most some fixed radius (com- 
puted according to the fractional Xuv values) around nodes of the graph until 
all nodes are included in some ball. These balls define the clusters in the final 
approximate solution. As high Xuv values hint that u and v should be in sep- 
arate clusters, this approach seems plausible. The fixed radius guarantees an 
approximation ratio on disagreements within clusters while the region-growing 
technique itself guarantees an approximation ratio on disagreements between 
clusters. 

First we present some notation that we need to define the algorithm. A ball 
B{u,r) of radius r around node u consists of all nodes v such that Xuv < f, the 
subgraph induced by these nodes, and the fraction (r — Xuv)lxvw of edges (w, w) 
with only endpoint v G B{u, r). The cut cut(5') of a set S of nodes is the cost of 
the positive edges with exactly one endpoint in S', i.e., 

cut(S) = ^ py.ui- 

|{u,i(;}n5| — 1 , {v,w)gE 

The cut of a ball is the cut induced by the set of vertices included in the ball. 
The volume vol(S) of a set S of nodes is the weighted distance of the edges with 
both endpoints in S, i.e., 

VOl(S) — ^ ^ Pvw^vw 

{v^w)gE 

Finally, the volume of a ball is the volume of B(u,r) including the fractional 
weighted distance of edges leaving B{u, r). In other words, if {v, w) G E is a, cut 
edge of ball B{u,r) with v G B{u,r) and w ^ B{u,r), then (v,w) contributes 
Pvw ■ Xvui ■ {r — Xuv) weight to the volume of ball B{u, r). For technical reasons, 
we also include an initial volume / to the volume of every ball (i.e., ball B(w, 0) 
has volume I). 

Algorithm. We can now present the algorithm for rounding a fractional solution 
FRAG to an integral solution SOL. Suppose the volume of the entire graph is 
F, and thus costp(FRAC) = F. Let the initial volume I of the balls defined in 
the algorithm be F/n. 

Algorithm Round 

1. Pick any node u in G. 

2. Initialize r to 0. 
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3. Grow r by min{(d„„ — r) > 0 : ^ B{u, r)} so that B{u, r) includes another 

entire edge, and repeat until cut(i3(u,r)) < cln(n+ 1) x vol(i3(u, r)). 

4. Output the vertices in B{u,r) as one of the clusters in §. 

5. Remove vertices in B{u,r) (and incident edges) from G. 

6. Repeat Steps 1-5 until G is empty. 

In this algorithm, c is some constant which we will determine later. This 
algorithm is clearly polynomial and terminates with a solution that satisfies 
the constraints. We must show that the resulting cost is not much more than 
the original fractional cost. Throughout the analysis section, we will refer to 
the optimal fractional solution as FRAG, the solution our algorithm returns as 
SOL, and the optimal integral solution as OPT. We also use FRAG(a::„„) and 
SOL(xu^) to denote the fractional and rounded solution to the linear program. 

Positive edges. The termination condition on the region-growing procedure guar- 
antees an 0(log n) approximation to the cost of positive edges (between clusters): 

costp(SOL) — ^ ^ Puv SOL(j:,^^) 

{u,v)^E 

= i ^ cut(R) 

ball B 

< I ln(n + 1) X E 

ball B 

<§ln(n-|-l)x I E P™FRAG(x„„) -I- E ~ 

ball B 

< I ln(n -I- 1) X (costp(FRAG) -I- F) 

< cln(n -I- 1) X costp(OPT) 

where the fourth line follows from the fact that the balls found by the algorithm 
are disjoint. 

The rest of our analysis hinges on the fact that the balls returned by this 
algorithm have radius at most 1/c. This fact follows from the following known 
lemma [20]: 

Lemma 1. For any vertex u and family of balls B(u,r), the condition 
cut(B(u,r)) < cln(n + 1) xvol(B(u,r)) is achieved for some r < 1/c. 

Negative edges. As in Bejerano et al. [2], we can use this radius guarantee to 
bound the remaining component of our objective function. We see that our 
solution gives a ^^-approximation to the cost of negative edges (within clusters) : 

costm(FRAG) = E TOii«(l — FRAG(xu^)) 

^ E E m„„(l -FRAG(x„„)) 

balls B {u,v)^BC\E 
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> ^ ^ to™(1-2/c) 

balls B {u,v)^BC\E 

>(l-2/c) TTluv 

balls B (u,v)^BnE 

= ^costm(SOL) 

where the third line follows from the triangle inequality and the 1/c bound on 
the radius of the balls. The approximation ratio is 0(1) provided c > 2. 

Overall approximation. Combining these results, we pay a total of 

cost(SOL) = costp(SOL) + costm(SOL) 

< ^ ln(n + 1) X costp(OPT) + - — - costm(OPT) 

< max ln(n + 1), - — cost(OPT) 

and thus we have an O(lnn) approximation, where the lead constant, c/2, is just 
slightly larger than 1. 

4 Approximation in l^^^^.p-Minor-Free Graphs 

In iiTr, r-minor-free graphs, we can use a theorem of Klein et al. [12] to round 
our linear program in a way that guarantees an O(r^) approximation to the cost 
of disagreements between clusters. The clusters produced by this rounding have 
radius at most 1/c, and thus the rest of the results from the previous section 
follow trivially. The theorem states that, in graphs with unit-length edges, there 
is an algorithm to find a “small” cut such that the remaining graph has “small” 
connected components: 

Theorem 1. [12] In a graph G with weight u on the edges which satisfy the 
triangle inequality, one can find in polynomial time either a Kr,r minor or an 
edge cut of weight 0(rU/S) whose removal yields connected components of weak 
diameter^ 0{r^6) where U is the total weight of all edges in G. 

As in the case of the region-growing technique, this theorem allows us to 
cluster the graph cheaply subject to some radius guarantee. As this clustering 
cost is independent of n, this technique is typically applied in place of the region- 
growing technique to get better approximations for ATj.^r-niinor-free graphs (see, 
for example, Tardos and Vazirani [19] or Bejerano et al. [2]). In particular, this 
implies constant factor approximations for planar graphs. 

^ The weak diameter of a connected component in a modified graph is the maximum 
distance between two vertices in that connected component as measured in the 
original graph. For our purposes, distances are computed according to the Xu,v which 
satisfy the triangle inequality and are defined on all pairs of vertices, so the weak 
diameter equals the diameter. 
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The idea is as follows. Given a iC^.r-minor-free graph G with weights Pe 
and edge lengths Xe as defined by the linear program, we subdivide each edge 
e into a chain of \kxe\ edges of the same weight pe for some appropriate k, 
yielding a new graph G'. We apply Theorem 1 to G' , getting an edge cut F' 
which maps to an edge cut F in G of at most the same weight. This creates the 
natural correspondence between the resulting components of G' and G. Note 
two nodes at distance d in G are at distance kd in G'. Hence, if we take S such 
that O(r'^S) < 2k jc, the components in G will have diameter at most 2/c. It is 
sufficient to take 5 = 0{k/r‘^). To bound the weight of the cut F, we just need 
to bound the total weight U' of the graph G'. Let U = '^g^cPe be the total 
weight of edges in G and recall vol(G) = Then 

U'=Y.V‘ 

eGG' 

= '^\kXe\Pe 

eGG 

< E {kXe + l)Pe 

eGG 

= k vol(G) + U. 

By Theorem 1, the weight of F is 0{rU' / 5) = 0(r^(fcvol(G) + U)/k). Taking 
k = C//vol(G), this becomes 0(r^vol(G)) and is thus an O(r^) approximation 
of the cost of disagreements between clusters, as desired. The size of G' may be 
pseudopolynomial in the size of G. However, the algorithm of Klein et al. [12] 
consists of r breath-first searches of G', and these can be implemented without 
explicitly subdividing G. Thus, the algorithm is polynomial. 

5 Lower Bounds 

We prove that it is APX-hard to minimize disagreements in correlation clus- 
tering. We use a reduction from the APX-hard problem minimum multicut [4]: 
given a graph G and k pairs of nodes P = {{ui,vi), . . . , (uk,Vk)}, find a set 
of edges of minimum weight that, when removed, separate each pair of nodes 
p € P. 

Theorem 2. An r- approximation for minimizing disagreements in correlation 
clustering implies an r- approximation for the minimum-multicut problem. 

Proof. Given a multicut instance G', construct graph G as follows. For every 
edge in G' of weight Cg, add an edge to G of weight Cg = Cg. Note all these 
Cg are positive. Let M be the maximum Cg. For each pair p = (ui,Vi), add an 
edge e between m and Vi of weight Cg = — (M -|- l)n^. Note we have added at 
most edges and increased the maximum weight by factor at most so G is 
polynomial in the size of G^ 

We claim that the cost of the optimal multicut in G' equals the cost of the 
optimal clustering in G. A correlation clustering of G that puts every vertex 
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in its own component costs at most Mn^. However, any solution that does not 
separate all pairs costs at least {M + l)n^, and so the optimum solution must 
separate all pairs. As the only negative edges in G are those between these pairs, 
the optimum solution only makes mistakes on positive edges (disagreements 
between clusters) . Therefore the optimum clustering in G induces a multicut of 
the same cost in G' . In fact, any clustering which only makes positive mistakes 
induces a multicut of the same cost in G' . Furthermore, any multicut in G' cuts 
all negative edges in G and thus induces a clustering in G of the same cost. In 
particular, the optimum multicut in G' induces a clustering in G of the same 
cost, and the claim follows. 

Now suppose we have an r-approximation algorithm for the correlation- 
clustering problem. Consider the output of this algorithm on graph G. If the 
outputted clustering only makes mistakes on positive edges (and so separates all 
pairs), then the above arguments show that this clustering induces a multicut 
which is an r-approximation to the optimal multicut in G' . If the output clus- 
tering does not cut some negative edge, then the cost is at least (M -|- l)n^. In 
this case, the clustering which places every node in a separate cluster costs at 
most Mn^ and is an r-approximation. Therefore, cutting every edge in G' is an r- 
approximation to the optimal multicut in G' . Thus, given an r-approximation al- 
gorithm for the correlation-clustering problem, we can design an r-approximation 
algorithm for the minimum- multicut problem. □ 

Because the minimum-multicut problem is APX-hard, this theorem shows 
that there is no PTAS for minimizing disagreements in correlation clustering 
unless P = NP. Furthermore, it shows that this problem is as hard as minimum 
multicut. The current best approximation for minimum multicut is O(logfc) [7]. 
Because k can be f?(n^) in the worst case, an o(logn) approximation for our 
problem would require improving the 0(log k) approximation for minimum mul- 
ticut, which a long-standing open problem. 

The above reduction is also useful in leading us to find difficult instances for 
the correlation-clustering problem. Garg, Vazirani, and Yannakakis [7] construct 
an example that shows that the ratio between the value of the minimum multicut 
and maximum multicommodity flow (i.e., optimal multicut linear-program value) 
can be as large as f?(log/c). The example uses a bounded- degree expander. 

Definition 1. A graph G is a bounded-degree expander if there is some con- 
stant d such that all nodes have degree at most d and for any set S of vertices, 
[S'! < n/2, the number of edges that cross S is at least cj^l for some constant c. 

We can use the same example to prove that the gap of our linear program (the 
ratio between OPT and FRAG) can be f7(logn), suggesting that it is probably 
not possible to obtain a o(log n) approximation by rounding this linear program. 

Theorem 3. The gap of the linear program presented in Section 2 is f?(logn) 
in the worst case. 

Proof. Gonsider a bounded-degree expander G'. Note since the degree of each 
node is at most d, there are at least n — ^yn vertices at a distance of at least 
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log^n/2 from any vertex v. Construct O(n^) pairs of vertices as follows: for 
each vertex v, add the 0{n) pairs (w, u) where m is a vertex of distance at least 
(log^; n)/2 from v. Assign all edges in the graph weight Ce = 1. Perform the above 
reduction to get graph G. As discussed, the optimal integral solution separates 
all the 0{n^) pairs of vertices. Hence, the diameters of the resulting clusters 
must be o{log^n). Because the vertices have bounded degree, the size of the 
clusters is bounded by n/2. By the expansion property of G', we must cut at 
least cJ2ses 1*^1 = positive edges, and so cost(OPT) = f2{n). 

On the other hand, assigning Xe = 2/log^n for positive edges and Xe = 1 
for negative edges is a feasible fractional solution of value at most {dn/2) x 
(2/log(^n), and so cost(FRAC) = 0{n/ logn). The theorem follows. □ 

6 Conclusion 

In this paper, we have investigated the problem of minimizing disagreements 
in the correlation-clustering problem. We gave an 0(log n) approximation for 
general graphs, and an 0(r^) approximation for AT^.r-minor-free graphs. We also 
showed that this problem is as hard as minimum multicut, and that the natural 
linear-programming formulation has a gap of f?(logn). 

A natural extension of this work would be to improve the approximation 
factor for minimizing disagreements. Given our hardness result and the history 
of the minimum- multicut problem, this goal is probably very difficult. Another 
option is to improve the lower bound, but for the same reason, this goal is 
probably very difficult. On the other hand, one might try to design an alternate 
0(log n)-approximation algorithm that is combinatorial, perhaps by counting 
“bad” cycles in a cycle cover of the graph. 

Another interesting direction is to explore other objective functions of the 
correlation-clustering problem. Bansal et al. [1] give a PTAS for maximizing 
agreements in complete graphs with edge weights in {— 1,-|-1}. In maximizing 
agreements, the cost of a solution is the weight of positive agreements (uncut 
positive edges) plus negative agreements (cut negative edges). They also mention 
the objective of maximizing agreements minus disagreements. This objective 
is of particular practical interest. However, there are no known approximation 
algorithms for this objective, even for complete graphs. 

Finally, it would be interesting to apply the techniques presented here to 
other problems. The region-growing technique and Klein et al. [12] rounding 
technique both provide a radius guarantee on the outputted clusters. Many pa- 
pers have used this radius guarantee to demonstrate that the solution is feasible, 
i.e. satisfies the constraints. Inspired by Bejerano et al. [2], we also use the radius 
guarantee to bound the approximation factor. This idea might be applicable to 
other problems. 

Acknowledgements. Many thanks go to Shuchi Chawla, Avrim Blum, Moham- 
mad Mahdian, David Liben-Nowell, and Grant Wang. Many results in this paper 
were inspired by conversations with Seffi Naor. 
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Abstract. The weighted matching problem is to find a matching in a 
weighted graph that has maximum weight. The fastest known algorithm 
for this problem has running time 0{nm + n^logn). Many real world 
problems require graphs of such large size that this running time is too 
costly. We present a linear time approximation algorithm for the weighted 
matching problem with a performance ratio of | — e. This improves the 
previously best performance ratio of i. 



1 Introduction 

A matching M in & graph G = (V,E) is & subset of the edges of G such that 
no two edges in M are adjacent. In a graph G = (V, E) with edge weights given 
by a function w : E M_|_ the weight of a matching is defined as w{M) := 
SeGM^(®)- weighted matching problem is to find a matching M in G 
that has maximum weight. The first polynomial time algorithm for the weighted 
matching problem is due to Edmonds [4]. A straightforward implementation of 
this algorithm requires a running time of 0(nfm), where n and m denote the 
number of vertices and edges in the graph. Lawler [8] and Gabow [6] improved 
the running time to O(n^). The fastest known algorithm to date for solving the 
weighted matching problem in general graphs is due to Gabow [7] and has a 
running time of 0{nm + nf log n). 

Many real world problems require graphs of such large size that the runtime 
of Gabow ’s algorithm is too costly. Examples of such problems are the refine- 
ment of FEM nets [9], the partitioning problem in VLSI-Design [10], and the 
gossiping problem in telecommunications [2]. There also exist applications were 
the weighted matching problem has to be solved extremely often on only moder- 
ately large graphs. An example of such an application is the virtual screening of 
protein databases containing the three dimensional structure of the proteins [5] . 
The graphs appearing in such applications only have about 10,000 edges. But 
the weighted matching problem has to be solved more than 100,000,000 times 
for a complete database scan. 

Therefore, there is considerable interest in approximation algorithms for the 
weighted matching problem that are very fast, have ideally linear runtime, and 
that nevertheless produce very good results even if these results are not optimal. 

* supported by DFG research grant 296/6-3 

S. Arora et al. (Eds.): APPROX 2003-1- RANDOM 2003, LNCS 2764, pp. 14-23, 2003. 

© Springer- Verlag Berlin Heidelberg 2003 
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The quality of an approximation algorithm for solving the weighted match- 
ing problem is measured by its so-called performance ratio. An approximation 
algorithm has a performance ratio of c, if for all graphs it finds a matching with 
a weight of at least c times the weight of an optimal solution. 

A simple approximation algorithm for the weighted matching problem with 
performance ratio ^ is obtained by the following greedy approach [1] : Start with 
an empty matching and extend it in each step by the heaviest edge currently 
available. The running time of this algorithm is 0(m log n) as it requires sorting 
the edges by decreasing weight. The first linear time ^-approximation algorithm 
for the weighted matching problem was proposed by Preis [11] using the idea of 
locally heaviest edges. Drake and Hougardy [3] obtained a simpler linear time 
approximation algorithm with the same performance ratio by using a completely 
different approach. In this paper we improve these results by proving the exis- 
tence of linear time approximation algorithms for the weighted matching problem 
which have approximation ratios arbitrarily close to |. 

Main Theorem. For each £ > 0 there exists a linear time approximation algo- 
rithm for the weighted matching problem with a performance ratio of ^ — e. 

The main idea of our algorithm is to start with a maximal matching M 
and to increase its weight by local changes. These local changes which we call 
short augmentations add in each step at most two new edges to M while up 
to four edges of M will be removed. A graph can possess up to short 

augmentations. To achieve linear running time only some part of these can be 
looked at. For each edge of the maximal matching M our algorithm only looks 
at all short augmentations that involve the endpoints of this edge. This way 
the short augmentations considered by the algorithm are in some sense spread 
evenly over the graph and their number is linearly bounded. 

As the short augmentations are partly overlapping it can happen that after 
performing one short augmentation several others are no longer available. For the 
performance ratio it is therefore important to be able to reject short augmenta- 
tions that achieve only minor improvements in the weight of the matching. This 
is achieved by only taking short augmentations into considerations that gain 
at least some constant factor (3 and that additionally yield the largest possible 
gain from all these. Such augmentations will be called /3-augmentations. In lin- 
ear time it seems not to be possible to find the best /3-augmentation. However 
we will show that in linear time a constant factor approximation of the best 
/3-augmentation can be found. 

To prove the performance ratio of our algorithm we use an amortized analysis. 
The idea is that the gain that is achieved by an augmentation is not realized 
immediately but part of it is stored in certain edges of the graph for later use. 
This way we are able to prove that the algorithm increases the weight of the 
given matching by some constant factor. By repeating the algorithm a constant 
number of times and choosing /3 sufficiently small the resulting matching will 
have a weight that comes arbitrarily close to |. 

The paper is organized as follows. In Section 2 we give basic definitions. In 
Section 3 we define short augmentations and use these to prove the existence of 
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the set of local improvements upon which our algorithm is based. In Section 4 
we present the algorithm and prove that its performance ratio is | — e for any 
£ > 0 . 

2 Preliminaries 

Let G = (V,E) be a weighted graph with weight function w : E ^ K+. For a 
subset F C E the weight of F is defined as w{F) := ^ matching 

M C E is called maximal if no proper superset of M in FI is a matching. By 
Mopt we denote a maximum weight matching in G, i.e. a matching that satisfies 
w{Mopt) > w{M) for all other matchings M. A path or cycle is called M- 
alternating if it uses alternately edges from M and E\M. Note that alternating 
cycles must contain an even number of edges. Let P be an alternating path such 
that if it ends in an edge not belonging to M then this endpoint of P is not 
covered by an edge of M. The path P is called M -weight- augmenting if 

w{E{P)f]M) < w{E{P)\M) . 

If P is an M-weight-augmenting path then MAP (the symmetric difference 
between M and P) is again a matching with strictly larger weight than M. The 
notion of M-weight-augmenting cycles is defined similarly. More generally we 
call an augmentation any process that replaces some edges of a matching M by 
some new edges and increases the weight of the matching. 

3 Short Augmentations 

A weight-augmenting path or cycle with respect to a matching M is called short 
if it contains at most two edges not belonging to M. The only weight-augmenting 
short cycle is by this definition an alternating cycle of length four and there exist 
six different types of weight-augmenting short paths. The following result shows 
that it is indeed enough to consider such short augmenting paths and cycles to 
obtain a ^-approximation of the maximum weight matching. 

Lemma 1. For any matehing M there exists a node disjoint set of weight- 
augmenting short paths and eycles such that augmenting along all these paths 
and cycles results in a matching of weight at least | • w{Mopt)- 

Proof. Consider the symmetric difference MAM opt- It consists of even length 
alternating cycles and of alternating paths. Order these paths and cycles arbi- 
trarily and number the edges of Mopt in the order in which they appear in these 
paths and cycles. Now partition Mopt into three sets by taking the edge num- 
bers modulo 3. By removing any of these three sets from MAMopt one obtains 
a set of alternating paths and cycles each of which contains at most two edges of 
Mopt- Removing the lightest of these three sets shows that M can be augmented 
to a matching of weight at least | • w{Mopt) by paths and cycles each of which 
contain at most two edges not in M. □ 
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In the following we need the notion of a j3- augmentation. For a constant (3 > 1 
a /^-augmentation of a matching M is an augmentation that has the property 
that the weight of the edges that are removed from M is at least by the factor (3 
smaller than the weight of the edges that are added to M by the augmentation. 
The following result shows that for small enough (3 any matching M can be 
augmented by short paths and cycles each of which is a /3-augmentation to a 
matching that has a weight close to | • w{Mopt)- 

Lemma 2. Let M be an arbitrary matching and (3 > 1 be constant. Then there 
exists a node disjoint set of weight- augmenting short paths and cycles each of 
which is a (3 -augmentation such that augmenting along all these paths and cycles 
results in a matching of weight at least ■ w{Mopt)- 

Proof. By Lemma 1 we know that there exists a node disjoint set of augmenting 
paths and cycles each of which contains at most two edges not in M such that 
augmenting along all these paths and cycles results in a matching M of weight at 
least I • w(Mopt)- We now claim that if we take the subset of these augmenting 
paths and cycles that are /3-augmentations, we get a matching of the desired 
weight. 

Partition the set M into two sets and M<p such that M >/3 contains 
all edges of M that are obtained by /3-augmentations and let M<p be all other 
edges of M. The set M similarly can also be partitioned into two sets M </3 
and M>/s according to the augmenting paths and cycles in MAM that contain 
these edges. By performing only the /3-augmentations one obtains the matching 
M^p U M>p. The weight of this set can be bounded from below as follows: 

w{M^p) + w{M>p) > ^w{M< 0 ) -\- w{M>p) > ^w{M) > ^w{Mopt) ■ 



□ 

Let M be an arbitrary matching and let M be a matching of weight at 
least I • w{Mopt) such that the symmetric difference of M and M consists of 
weight-augmenting short paths and cycles. The existence of M is guaranteed by 
Lemma 1. For each cycle or path in MAM choose an edge in M that is adjacent 
to all edges of M in this path or cycle. Call the set of all these chosen edges 
PI* . For each edge e G PI* denote by Sg the (at most two) edges of M that are 
adjacent to e. For an arbitrary set F of edges denote by inc{F) all edges in PI 
that are incident to the endpoints of edges in F. Then inc(Se) contains at most 
three edges of PI and 5'e U inc{Se) is the set of edges of the path or cycle in 
MAM that contains the edge e. 

For a given constant /3 > 1 we partition the set M* into two subsets M* ^ and 
such that contains all edges of M* such that w{Se) < P ■ w{inc{Se)) 
and My^ contains all other edges of M*. 

The following result shows that if an algorithm achieves at least a constant 
fraction of the value ^ • w{Se) — w{inc{Se)) for all e € then it will improve 
a given matching by a constant factor. 
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Lemma 3. Let M be a matching of weight w{M) > a ■ w{Mopt)- If the 
matching M' has a weight that is larger than the weight of M by at least 
£ • (3 • - w{inc{Se))^ then 

w{M') > ■ w{Mopt) ■ 

Proof. By the definition of we have w{inc{Se)) > ^ ■ w{Se) for e G 
and w{M) = X^eeM* w{inc{Se))- Applying these two facts we get 

w{M') > w{M) + £ ■ ^ • w(5'e) — tc(inc(S'e)) j 

e£My I 

= {1-£)-w{M)+£. Y. w{inc{Se)) + £ ■ 



eGM 



</3 



eGM* 



>{!-£)■ w{M)+£- Y ^■MSe)+e- Y ^•^(‘5'e) 



eGM* 



eGM* 



P 



= {!-£)■ w{M) + ^ • X! 

^ eGM* 

> (1 - £:) • a • w{Mopt) + ^ ^ ■ w{Mopt) 



= a + e • 



3/3 



- a • w{Mopt) 



4 The Algorithm 

For the algorithm we have to extend the notion of weight-augmenting short 
paths and cycles slightly. Let ^ C A be a set of at most two non-adjacent edges 
such that there exists an edge e in E that is adjacent to all edges in S. Then 
by removing all edges from a matching M' that are adjacent with some edge 
in S and by adding S one obtains a new matching M" . If w{M") > w{M') 
we say that 5" is a (short) augmenting set centered at e with respect to M' . 
As S contains at most two edges there are at most four edges in M' that will 
be removed. Note that the sets Sg introduced in Section 3 are augmenting sets 
centered at e with respect to M. 

For the description of the algorithm and the proof of its performance ratio we 
need the following additional definitions. Let M denote the maximal matching 
that the algorithm begins with and which is the matching that defines the set 
Se as described in Section 3. Let M’ denote the matching that is continuosly 
updated by the algorithm by means of augmentations. Let aug{e) denote the 
set of edges that the algorithm chooses for a /3-augmentation at e G M. For an 
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arbitrary set F of edges denote by inc'{F) all edges in M' that are incident to 
the endpoints of edges in F. 



Algorithm improvejnatching (G = {V,E),w : E — > R+, M) 

1 make M maximal 

2 M' ■- M 

3 for e £ M do begin 

4 if there exists a /3-augmentation with center e 

5 then augment M' by a good /3-augmentation with center e 

6 end 

7 retnrn M' 



Fig. 1. Algorithm improvejmatching for increasing the weight of a matching. 



The algorithm, which we call improve_matching, is shown in Figure 1. Start- 
ing from a maximal matching M the algorithm visits each edge e £ M exactly 
once. For each e £ M the algorithm determines if there is any /3-augmenting set 
centered at e in M' . If there is none then the algorithm moves on to the next 
edge in M. Otherwise, there is a /3-augmenting set centered at e. The algorithm 
then tries to find the best /3-augmenting set centered at e. The gain of an aug- 
menting set S is defined to be w{S) — w{inc'{S)) which is the amount by which 
M' increases by augmenting S. We define the best ^-augmenting set centered at 
e to be the /3-augmenting set centered at e with the largest gain. 

However, the algorithm is not guaranteed to find the best /3-augmenting set 
centered at e but rather it finds a good /3-augmenting set at e. We define a good 
(3-augmenting set centered at e to be a /3-augmenting set centered at e with a 
gain of at least times the gain of the best /3-augmenting set centered at 
e. For technical reasons we assume from now on that 1 < /3 < | which is no 
restriction as in the end (3 will turn out to be very close to 1. 

Figure 2 shows our algorithm for finding a good /3-augmentation. It takes an 
edge e as input and returns a good /3-augmenting set centered at e if any such set 
exists. We need a few more definitions to describe the algorithm. For an arbitrary 
edge X let wm{x) be 0, if x ^ M and define wm{x) = w{x) otherwise. Arbitrarily 
label the endnodes of e as left and right. Then any edge a ^ M' that is incident 
to left together with incfa) \ {e} form a left arm of e. The definition of a right 
arm of e is symmetrical to this. The gain of an arm of e that consists of a ^ M' 
together with inc'{a) \ {e} is defined as gaiua ■= wm{o) — WM{inc' {a) \ {e}). 
The gain of an arm of e that consists of just a ^ M' is defined in the obvious 
way as just gaiUa ■= wm{o)- 

We define a left arm a U {inc'{a) \ {e}) to be allowable if there exists a 
right arm b U (ind {b) \ {e}) such that a U & or a alone forms a /3-augmenting 
set at e. We calculate the left allowable arms as follows: First, we calculate 
the greatest surplus from among the right arms, where we define the surplus of 
the right arm b U {inc' (b) \ {e}) as WM{b) — (3 ■ {wM{inc'{b) \ {e}) -I- WM{e)). 
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Algorithm good-/3-augmentation (G = {V,E),w : E R+,e G E) 

1 find the right and left arms of e 

2 determine the gains and surpluses of the left and right arms 

3 /eft := largest left allowable arm and its best extension 

4 rig/it := largest right allowable arm and its best extension 

5 if /eft= 0 and right= 0 

6 then return 0 

7 else return max{left, right) 



Fig. 2. Algorithm for finding a good /3-augmentation. 



One can think of the largest surplus from among all the right arms, denoted 
surpr, as the maximum value that a right arm can loan to a left arm in order 
to make it part of a /3-augmenting set at e. A left arm is allowable if and only if 
wm{o) — /3 • WM{ind {a) \ {e}) -I- surpr > 0. The definition of a right allowable 
arm, the maximum left surplus surpi, as well as the process for calculating these 
is symmetrical. 

Once the algorithm has calculated the left and right allowable arms of e 
it chooses from among these the one with the largest gain. Without loss of 
generality let it be a left arm. Let a ^ M' be the uncovered edge in this arm. 
Then the algorithm returns the best /3-augmenting set centered at e that contains 
a. 

Lemma 4. If there exists a (3-augmenting set centered at e then the algorithm 
good- (3 -augmentation (Figure 2) returns a good (3-augmenting set centered at e. 
The running time is proportional to the sum of the degrees of the end-vertices of 
the edge e. 

Proof. Sketch of proof: If the largest possible gain of an arm is larger than twice 
the weight of e then one easily gets that the algorithm finds a /3-augmentation 
that achieves at least ^ of the largest possible gain. This is because then the 
best /3-augmentation gains at most 3 times the weight of e and the algorithm 
finds a /3-augmentation that gains at least the weight of e. 

In the other case, using the fact that the algorithm finds a /3-augmentation 
that does not share an arm with the best possible /3-augmentation, one can show 
that (3—1 must be sufficiently small such that is big enough to scale the 
gain found by the algorithm. For /3 < | the latter is larger. □ 



Lemma 5. The algorithm improvc-matching improves the matching M by at 
least 

X] - w(/nc(5'e))^ . 

Proof. Define S := The algorithm visits every e G M and hence every 

e G If it finds any /3-augmenting set for e it also finds and augments a /3- 
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augmenting set aug{e) which yields at least (5 of the gain of the currently best /3- 
augmenting set at e. Even though the algorithm cannot distinguish between the 
edges of and or know the previously defined /3-augmenting sets Se we 
show by means of amortization that for each of these sets Se it finds a constant 
proportion of ^ • w{Se) — w{inc{Se))- The idea is that for each e G the 

algorithm can either find an augmenting set as good as Se in M or the matching 
M' has increased by enough weight to already assign a constant proportion of 
this gained weight to e. 

The idea of the amortized analysis is that when the algorithm augments at e G 
then M' either gains a constant proportion of ^ • w{Se) — w{inc{Se)) right 
away or M' can additionally make a withdrawal of weight that has been added 
to M'’s savings in the past in a way that brings a total win of some constant 
proportion of ^■w{Se) — w{inc{Se)) to M' . One builds up M'’s savings as follows. 
For each e G the matching M' gets charged all of the augmentation that the 
algorithm finds at e and this amount gets put in savings. This is not a problem 
because there are no sets Se associated with these edges anyway. If e G 
then M' keeps ^ of the augmentation that the algorithm finds at e and M' gets 
charged the other half to savings. If this is done for all e G M then M' can 
later make withdrawals from savings when necessary. This is necessary when 
for instance one needs to augment at e G but the edges incident to Se all 
have greater weight in M' than they had in M. Let E C (M' \ M) denote the 
set of new edges incident with the nodes of Se for some such e G Then a 

withdrawal from M'’s savings of ^ ■ w{E) can be made for the augmentation 

at e. The factor comes from the fact that the edges in the set E were added 
to M' during /3-augmentations that occured in the past and so there must have 
been at least this much put in savings in the past. The factor i comes from the 
fact that each edge in E has two endnodes and therefore each e in if can be 
involved in at most two withdrawals since each of the Se are node disjoint. 

More concretely, when the algorithm visits e G there are three pos- 

sibilities for the set Se'- The first is that Se is still /3-augmenting in M' with 
w(inc’ (Se)) < w(inc{Se)), the second is that Se is still /3-augmenting in M' with 
w(inc'(Se)) > w(inc(Se)), and the third is that Se is no longer /3-augmenting in 
M'. 

For the first possibility for e G we have Se is still /3-augmenting with 

w(inc' (S e)) < w(inc(Se)) when the algorithm visits it. Since the algorithm al- 
ways finds a /3-augmentation aug(e) of at least 6 the gain of the largest /3- 
augmentation at e in M' it follows that after M' has been charged ^ of the 
augmentation found at e the amount of weight that M’ increases by is 

X (5 

-(w(aug(e)) - w(ind (aug(e)))) > -(w(Se) - w(ind(Se))) 

> ^(w(Se) - w(inc(Se))) 
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For the second possibility for e G we have that Se is still /3-augmenting 
with w{inc' {Se)) > w{inc{Se)) when the algorithm visits it. Let A denote the 
set inc' {Se) \ inc{Se)- The set A contains only new edges, i.e., edges that were in 
augmentations, therefore M'’s has increased in the past by at least ■w{A) of 
which at least half can be withdrawn by M' . This together with the augmentation 
that the algorithm will find at e, one half of which M' gets to keep, means that 
M'’s total win at e is at least 



> 

> 

> 



^{w{aug{e)) - w{inS {aug{e)))) + ' M^) 

- w{inc'{Se))) + \ ^ p ^ {w{inc' {Se)) - w{inc{Se))) 



(5/3-1 

2 ^ 



{w{Se) — w{ind {Se)) + w{ind {Se)) — w{inc{Se))) 



- w{inc{Se))). 



For the third and final possibility for e € ^ we have that Se is no longer /3- 

augmenting when the algorithm visits it, i.e., w{ind{Se)) > ^-w{Se)- Therefore, 
the set of edges A = ind {Se)\inc{Se) has weight w(A) > ^ ■ w{Se) — w{inc{Se)) ■ 
Then M'’s savings must have increased by at least • w{A) of which at least 
i can be withdrawn by M'. So independently of wether the algorithm finds a 
/3-augmenting set at e in M' , M' gets a total win in this step of at least 

1/3-1 ^ 1/3-1, 1 ,,, , ,. ,,, ,,, 



The minimum weight that M' increases by at each e G over all three 

cases is ^^^{^w{Se) — w{inc{Se))) which proves the lemma since we defined S 
as^. 

□ 



Theorem 1. If M is any matching with w{M) > a-w{Mopt) then after applying 
the algorithm improvejmatching one obtains a matching M' with weight at least 

Proof. This is an immediate consequence of Lemma 3 and Lemma 5. □ 

We are now able to prove the main theorem. 

Main Theorem. For each e: > 0 there exists a linear time approximation algo- 
rithm for the weighted matching problem with a performance ratio of ^ — e. 
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Proof. Theorem 1 shows that by repeating algorithm improve_matching one gets 
a matching with weight arbitrarily close to ^ • w{Mopt). Now by choosing f3 > 1 
small enough one gets a matching with weight arbitrarily close to | • w{Mopt). 
Note that (3 and the number of repeats of algorithm improve_matching are con- 
stants depending on e. As the algorithm improve_matching has linear running 
time the total running time stays linear. □ 
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Abstract. A tree cover of a graph G is defined as a collection of trees 
such that their union includes all the vertices of G. The cost of a tree 
cover is the weight of the maximum weight tree in the tree cover. Given 
a positive integer k, the fc-tree cover problem is to compute a minimum 
cost tree cover which has no more than k trees. Star covers are defined 
analogously. Additionally, we may also be provided with a set of k vertices 
which are to serve as roots of the trees or stars. In this paper, we provide 
constant factor approximation algorithms for finding tree and star covers 
of graphs, in the rooted and un-rooted versions. 



1 Introduction 

This paper was motivated by the following “Nurse station location” problem. 
A hospital wanted to locate k nurses in its coverage area. Each nurse would be 
assigned a certain set of patients, who she would visit in her morning rounds. 
The objective is to figure out where to locate the nurse stations and how to 
assign patients to nurses so that the last completion time is minimized. 

This problem is equivalent to covering a metric graph with no more than 
k tours, so that the maximum length of a tour is minimized. Since minimum 
spanning trees are constant factor approximations to traveling salesperson tours, 
we look at covering the graph with no more than k spanning trees, so that the 
maximum weight of a tree is minimized. A variant is when the nurse has to return 
to her station before visiting each patient, in order to pick up equipment and 
supplies necessary for that patient. In that case, we get the problem of covering 
the graph with stars. If the hospital has already built its nursing stations and 
only wants to assign patients to nurses, we get the rooted versions of these 
problems. The problems are defined formally in the next section. 
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Previous and related work. The problems studied in this paper are closely 
related to those studied by Arkin, Hassin, and Levin [1]. The problems they deal 
with include covering the nodes of a graph or a subset of the edges of a graph 
by paths, walks, or stars. Most of their approximation algorithms deal with 
minimizing the number of covering objects (e.g. paths) subject to a constraint 
on the cost of each covering object. They also consider unrooted versions of k- 
path covers and fc-walk covers. The algorithms in [1] do not seem to extend to 
rooted versions. 

These problems fall in the general class of “vehicle routing” problems (see [16] 
for a recent survey). In the fc-traveling salesperson problem, a feasible solution 
consists of k tours that cover the nodes, where the tours share that same depot 
(i.e. starting and ending point). The objective is to minimize the total length of 
tours. The fc-traveling salesperson problem was first approximated to a constant 
by Frederickson, Hecht and Kim [8] (see also [11]). Recently, Fakcharoenphol, 
Harrelson and Rao [7] provided a constant-factor approximation algorithm for 
the fc-traveling repairman problem, where the objective is to minimize the aver- 
age waiting time of the customers. 

The problems we study can also be viewed as members of the family of 
“clustering” problems. We are partitioning the vertices of the graph into clusters, 
where the weight of a cluster is the weight of the cheapest spanning tree or star 
in the cluster. Other versions of clustering studied recently include the cases 
where the weight of a cluster is the cost of a clique on it [2,10], the maximum 
cluster diameter (also called fc-center) [6,12] and the sum of radii of clusters [3]. 

Our results and techniques. For both the rooted and un-rooted versions of k- 
tree cover, we get polynomial time approximation algorithms with performance 
ratio 4. Both algorithms can be made strongly polynomial with a slight loss in the 
approximation guarantee, which becomes 4-|-e. The algorithms are combinatorial, 
and rely on a matching construct to prove the approximation guarantee. 

We use LP rounding to provide a (4, 4) bi-criteria approximation algorithm 
for un-rooted fc-star cover. That is, our algorithm outputs a solution which covers 
the graph with no more than 4fc stars, and the cost of the solution is no more 
than four times the cost of an optimal solution which uses no more than k 
stars. Finally, we show that the rooted version of fc-star cover reduces to a 
scheduling problem studied by Shmoys and Tardos [15]. This immediately yields 
a 2-approximation algorithm for this problem. 

For the unrooted fc-star problem. Levin [13] suggested an improvement of our 
(4, 4) bi-criteria approximation to a (3, 3) bi-criteria algorithm; this improvement 
is based on the minimum star cover approximation algorithm from [1] . 

Organization. We define the four versions of the problem in the next section. 
In Section 3, we prove that all four problems are NP-hard. We provide constant 
factor approximation algorithms for the rooted and un-rooted versions of fc-tree 
cover in Section 4. We deal with fc-star cover in Section 5. We conclude with 
some open questions in Section 6. 
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2 Problem Definition 

fe-tree cover. Let G = (V,E) denote an undirected graph with positive integral 
edge weights w : E ^ 1V+. A tree cover of a graph G = (P, E) is a set T of trees 
{Ti}i such that V = UiLi The cost of a tree cover T is maxT-gr ■fc(Ti). 

Note that trees in a tree cover may share nodes and even edges. The goal in 
the min-max k-tree cover problem is to find a minimum cost tree cover consisting 
of at most k trees. 

Rooted fe-tree cover. Let R C V denote a set of roots. An R-rooted tree cover 
of a graph G = {V, E) is a tree cover T, where each tree Ti G T has a distinct 
root in R. 

As before, trees in an i?-rooted tree cover may share nodes and edges. In 
particular, the root of Ti may be in Tj, but the roots of Tj and Ti must be 
distinct. Given an edge weighted graph G and a set R of roots, the min-max 
R-rooted tree cover problem is to find a minimum cost i?-rooted tree cover of G. 

Star covers. Star cover problems are defined over complete graphs (i.e. finite 
metric spaces). The goal is to cover the vertices of the graph with stars so that 
the maximum weight of a star is minimized. 

A star cover is a cover of the vertex set of a graph by stars. The cost of a 
star cover is the weight of the heaviest star in the cover. In the min-max k-star 
cover problem, the goal is to find a minimum cost star cover using at most fe 
stars. Let R = {ri}i denote a set of roots. In the min-max R-centered star cover 
problem the goal to find a minimum cost star cover S = such that, for 

every r G R, the center of Sr is r. 



3 Hardness 

In this section, we show that all four problems are NP-complete. We begin by 
showing the NP-completeness of i?-centered star cover, and then extend the 
result to the other three problems. 

We show the NP-completeness of i?-centered star cover by reducing BIN- 
PACK to it. An instance of BIN-PACK consists of (i) a set U of elements, where 
the size of an element u G U is s„, (ii) fe bins, and (iii) a positive bin capacity B. 
The problem is to determine if there is a partition of U into fe parts Ui, . . . ,Uk 
such that for every i = 1, . . . , fe, we have This was shown to be 

NP-hard in [9]. 

Theorem 1. The min-max R-centered star cover problem is NP-complete. 

Proof. Given an instance II = {U,{su}u,k,B) of BIN-PACK, we transform it 
to an instance of i?-centered star cover as follows. We create a complete bi- 
partite graph G{n) with a vertex set RU U, where i? is a set of fe new nodes 
R= {ri, . . . , Tfe}. For every Vi and every u G U, the weight of an edge e = (r^, u) 
is set to w{e) = s„. We complete G(7T) into a metric space (i.e. complete graph) 
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Algorithm 1 Rooted-Tree-Cover(G, R, B) - Compute an i?-rooted tree cover of 
G with cost at most AB. 

1; Remove all edges of weight greater than B. 

2\ M ^ MST of graph obtained from G by contracting roots in i? to a single node. 

3: {Ti\i ^ forest obtained from M by un-contracting roots in R. 

4: Edge-decompose each tree Ti into trees + Li such that w{Sf) £ [B, 2B), for 

every j, and w(Li) < B. 

5: Try to match the trees {Sj}ij to roots, subject to the constraint that a tree Sf 
may be matched only to roots of distance at most B from it. 

6: If not all trees are matched, then return fail: “B is too low”. 

7: If every tree is matched, then return success: set of trees where each tree consists 
of Sj, its matched root r, and the leftover tree L (if any) that contains the root r. 



K{n) by taking the metric completion of the edge weighted graph G{U). We 
designate R to be the set of roots, and ask if there is an i?-centered star cover 
of K{U) of cost no more than B. 

It is immediate that every bin packing induces an i?-centered star cover of 
the same cost. Conversely, every i?-centered star cover induces a partition of 
[/, where a bin size equals the weight of the corresponding star. Since an R- 
centered star cover of cost B gives a solution of cost B for BIN-PACK, the 
theorem follows. 

The following theorem can be proved by converting in polynomial time an 
optimal solution to any of the three other problems to an i?-star cover at the 
same cost. 

Theorem 2. The following problems are NP-complete: min-max R-tree cover, 
min-max k-tree cover, and min-max k-star cover. 

4 Clustering into Trees 

4.1 i?- rooted Tree Cover 

Algorithm. In this section we present a 4-approximation algorithm for the 
min-max i?-rooted tree cover problem. A strongly polynomial version of this 
algorithm has an approximation ratio of (4 -|- e) . 

The approximation algorithm is based on Algorithm Rooted-Tree-Cover, 
which is given (i) a graph G = (V,E) with edge weights w(e), (ii) a set R 
of k roots, and (iii) a bound B on the weight of each tree. Algorithm Rooted- 
Tree-Cover either returns with a proof that B is too small (i.e., B < B* , the 
minimum cost of a tree cover) or finds an R-rooted tree cover of cost at most 4B. 
By applying binary search, a 4-approximation algorithm is obtained. In Section 
4.1 we discuss a how to derive a strongly polynomial algorithm. 

A listing outlining Algorithm Rooted-Tree-Cover appears in Algorithm 1. We 
now explain each step in detail. The algorithm begins by removing all edges of 
weight greater than B, since they obviously cannot be used. If as a result of 
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deleting heavy edges there exists a node that is no longer connected to i?, then 
obviously B < B*. To keep the description simple, we assume that the graph 
remains connected even after the heavy edges are deleted. In Line 2, the roots 
in R are contracted to a single node, and the algorithm computes a minimum 
spanning tree (MST) in the contracted graph. In Line 3, the MST is broken into 
a set {Ti}i of k disjoint trees by un-contracting the nodes in R. Note that, by 
construction, every tree R is rooted at a root of R. In Line 4, the edge set of 
every tree R is decomposed into subtrees {Sj}j + Li. The subtrees may share 
nodes but not edges. The weight of every sub-tree Sj is in the range [B,2B) and 
there is perhaps a leftover tree Li whose weight is less than B. We elaborate 
below how this edge decomposition is performed. In Line 5, a bi-partite graph is 
constructed as follows. One side of the vertex set is R and the other side consists 
of nodes representing the trees {Sj}ij. An edge connects a root r and a tree Sj 
if the distance between S'® and r is at most B. A maximum matching is then 
computed in this bi-partite graph. The algorithm considers now two cases: If 
the maximum matching does not match all the sub-trees, then in Line 6, the 
algorithm reports a failure by returning the statement that B is too small. If 
the maximum matching matches all the sub-trees, then the algorithm returns 
the set of trees where each tree consists of a subtree S®, the root r matched to 
the subtree 5"®, a shortest path from the root r to S'®, and the leftover tree L (if 
any) that contains the root r. 

We now elaborate on how the edge set of every tree is decomposed in Line 4. 
Consider a tree Ti rooted at r. For a node v let Ty denote the rooted subtree 
hanging from v. Consider an edge e = (u, v) where u is the parent of v. The 
subtree is the subtree that contains three parts u, Ty, and the edge {u,v). 
The weight w{Ty) of a subtree Tf. is the sum of the edge weights in Ty. Given the 
threshold value B, depending on w(Te), a subtree Tg is defined as light, medium 
or heavy as follows. If w{Te) > 2B, then Tg is heavy. If w{Te) < B, then Tf, 
is light. If w{Te) € [B,2B), then Tg is medium. The decomposition algorithm 
proceeds by splitting away subtrees. Recall that subtrees may share nodes, hence 
the definition of splitting T' away from R means: (i) designate T' as a new part, 
(ii) remove the edges of T' from R, and (iii) let Ti now contain only nodes and 
edges that are still connected to the root of R. Note that, when Ti^u,v) is split 
away from R, the node u remains a node in Ti so T(^u,y) th® remaining tree 
share the node u. 

One can always split away medium subtrees from the (remaining) tree. Since 
such medium weight subtrees are split away whenever possible, we now focus 
on the case that subtrees are either light or heavy. If every subtree is either 
heavy or light, let v denote a heavy node, all the children of which are light. 
We bunch edges ei, 62 , . . . emanating from v to children of v until the first time 
the cumulative weight of the trees hanging from these edges exceeds B. We then 
split away the subtree IJj (note that this tree is a medium subtree since 
w{Tf,.) < B, for every i). The decomposition stops as soon as the weight of the 
remaining tree is less than B. If upon termination the edge set of R is not empty. 
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then Ti is declared as a leftover tree Li. Note that in this case the root of the 
leftover tree Li is r (where r G R is the root of the tree Ti) . 



Correctness and Approximation Ratio. In this section we prove two lem- 
mas: Lemma 1 proves the correctness of the algorithm and Lemma 2 proves its 
approximation ratio. Let B* be the minimum cost of a tree cover of G. 

Lemma 1. If Algo. Rooted-Tree-Cover returns “B is too low”, then B* > B. 

Proof. We prove the contrapositive, namely, if B > B* then there exists a match- 
ing in the bi-partite graph that matches every subtree in {Sj}ij to a root in R. 
The existence of such a matching is equivalent by Hall’s Marriage Theorem [5] 
to the condition that, for every subset S of trees from the neighbor set 

N{S) of S satisfies |fV(5)| > |5|. 

Consider a subset S of trees from Every tree S G S satisfies w{S) G 

[B,2B).Bence, w{S) > B -ISl 

Consider an optimal i?-rooted tree cover T* = {Tf , . . . ,Tf}. Let T*{S) 
denote the subset of trees of T that are stabbed by trees from S. Namely, T* G 
T*(S) iff there exists a tree S G S such that SCiT* is non-empty. Note that there 
is an edge in the bi-partite graph between a tree and r if the tree Tf rooted 
at r intersects the tree Sy Hence |fV(5)| > |T*(5)| and it suffices to prove that 
|T*(5)| > |5|. Note that the weight of T*(5) satisfies w(T*(5)) < B* ■ |T*(5)|. 

Every node in 1J5 is connected by edges in 1JT*(5) to a root. Recall that 
every edge in is also an edge in the MST M (from Line 2). Let M' denote the 
subgraph obtained from the MST M by deleting edges in IJ 5 and then adding 
edges in IJ T*{S). Every vertex is connected in M' to a root, hence, the subgraph 
M' is connected if the roots are contracted. It follows that w{M') > w{M), and 
hence, w{T*{S)) > w{S). We conclude that B* ■ |T*(5)| > w(T*{S)) > w{S) > 
B ■ |5|. Since B* < B, it follows that |T*(5)| > |5|. Hence, Hall’s condition 
holds, and the lemma follows. 

The following lemma proves that the approximation ratio is 4. 

Lemma 2. When suceessful, Algorithm Rooted-Tree-Cover finds an R-rooted 
tree cover of cost at most 45. 

Proof. By construction, each tree returned by the algorithm has a distinct root 
from R and every node belongs to at least one tree. The weight of each tree 
equals the weight of the tree S'* (which is bounded by 25), the weight of the 
path from the root r to a node in S* (which is bounded by 5), and the weight 
of the leftover tree (which is bounded by 5). It follows that the weight of every 
tree is less than 45, and the lemma follows. 

Note that a path from a root r to a subtree S* may contain edges and nodes 
that also belong to other trees. Hence, when successful. Algorithm Rooted- Tree- 
Cover covers the graph with trees, but these trees are not disjoint. 
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Algorithm 2 Tree-Cover(G, k, B) - Compute an fc-tree cover of G with cost at 

most AB. 

1; Remove all edges of weight greater than B. Let {Gi}i denote the connected com- 
ponents after deleting heavy edges. 

2: MSTi ^ MST of Gi. 

3 : 

4; If -I- 1) > fc then Return fail: “B is too low”. 

5; Edge-decompose each tree MSTi into at most {ki -|- 1) trees {Sj}j -I- Li such that 
w{Si) € [2B,4S)i for every j, and w{Li) < 2B. 

6; Return success: set of trees U {Li}i. 



Strongly Polynomial Algorithm. Let n = \V\ and consider an e > 0. Our 
goal is to find a (4 -|- e)-approximation algorithm that is polynomial in n and in 
log i. Sort the edge weights, let < W 2 < • • • < Wm denote the sorted edge 
weights. Obviously B* < n ■ Wm- If Algorithm Rooted- Tree-Cover reports that 
B < B* for B = Wm, then the weight of all edges of weight at most Wmlin'^ls) 
is less than e ■ B* . Hence, we may contract all these edges, and consider only 
the remaining edges of weight at least e • Now binary search within the 

range [e ■ n ■ Wm] is strongly polynomial. 

If Algorithm Rooted-Tree-Cover does not fail with B = Wm we do the follow- 
ing. Let i denote an index such that (i) Algorithm Rooted-Tree-Cover reports 
B < B* for B = Wi, and (ii) Algorithm Rooted-Tree-Cover finds an i?-rooted 
tree cover of cost 4 • Wi+\. Hence B* S (wi,4 • iCi+i]. If Wi+i/wi < bi- 
nary search within the range [wi, Wi+i] is strongly polynomial. Otherwise, let 
w' = — ■ Wi- Run Algorithm Rooted-Tree-Cover with B = wk If the algorithm 
finds an i?-rooted tree cover of cost 4w', binary search within the range [lUi, in'] 
is strongly polynomial. If the algorithm reports that w' < B* , the weight of all 
edges of weight at most Wi is bounded by n? -Wi < e- B* . Hence, we may contract 
all these edges, and consider only the remaining edges of weight at most A-Wi+i. 
Now binary search within the range [wi+\,4 ■ lUi+i] is strongly polynomial. 

By combining Lemmas 1, 2, and the above discussion we conclude with 
the following theorem. Note that if edge weights are polynomial, then a 4- 
approximation algorithm follows. 

Theorem 3. For every e, there is a (4 + e)- approximation algorithm for min- 
max rooted tree cover that runs in time polynomial in the size of G and log(i). 

4.2 fc-tree Cover 

In this section we present a 4-approximation algorithm for the fe-tree cover prob- 
lem. A strongly polynomial version of this algorithm has an approximation ratio 
of (4 -I- e). 

Algorithm. A listing of Algorithm Tree-Cover appears as Algorithm 2. The 
input consists of (i) a graph G = {V, E) with positive integral edge weights in(e). 
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(ii) a bound k on the number of trees allowed in the cover, and (iii) a bound 
B on the weight of each tree in the cover. The algorithm returns either “fail” 
(meaning that B is too small), or “success” with a tree cover the cost of which 
is bounded by 4i?. 

As in Algorithm Rooted- Tree-Cover, Algorithm Tree-Cover begins by remov- 
ing edges of weight bigger than B. The removal of heavy edges may render G 
unconnected; we denote the connected components by {Gi\i. In Line 2, a min- 
imum spanning tree MSTi is computed for each component Gi. In Line 3, an 
estimate ki of the number of trees needed to cover Gi is computed. In Line 4, 
the algorithm returns with “fail” if the estimates are too small. This means that 
the algorithm has a proof that the cost of an optimal fc-tree cover of G is greater 
than B. In Line 5, each tree MSTi is edge decomposed to at most {ki + 1) sub- 
trees. Each subtree is of cost at most 4R. The edge-decomposition procedure is 
the same procedure that is used in Line 4 in Algorithm Rooted- Tree-Cover (with 
the threshold 2B instead of B). In Line 6, a tree cover consisting of at most k 
trees is returned. The cost of the returned tree cover is at most AB. 

Note that the edge-decomposition procedure decomposes MSTi into at most 
ki + 1 trees. By setting a threshold of 2B it follows that the weight of each tree 
Sj is at least 2B and at most 45. It follows that the number of trees {Sj}j 
obtained when decomposing MSTi is at most ki. Together with the the leftover 
tree Li (if it exists) we obtain at most -|- 1 trees. 

Correctness and Approximation Ratio. Let B* denote the cost of a min- 
max fc-tree cover of G. Let T* = {T* , . . . ,T^} denote an optimal fc-tree cover. 
If B* < B, then T* uses edges of weight no greater than B. Let k* denote the 
number of trees in T* that contain nodes of Gi. 

Lemma 3. If B* < B then ki + 1 < k* , for every i. 

Proof. For simplicity, let Tf , . . .Tf, denote the trees that cover Gi. By adding 
at most k* — 1 edges, one can connect these k* trees to obtain a tree that spans 
Gi. Since the cost of each such edge is at most B, we obtain: + 

{k* — 1) ■ B > w{MSTi). Since w{T*) < B, we obtain k* > 
lemma follows because ki < ^ 

Lemma 3 immediately implies the following lemma. 

Lemma 4. If Algorithm Tree-Cover returns “B is too low”, then B* > B. 

We conclude with the following theorem. Note that a 4-approximation algorithm 
is obtained if the edge weights are polynomial. 

Theorem 4. For every e, there is a {A -\- e)- approximation algorithm for min- 
max tree cover that runs in time polynomial in the size of the graph and in 

log(i). 

Proof. When B > B*, Lemma 4 implies that Algorithm 2 is successful and a k- 
tree cover of cost at most 45 is computed. A strongly-polynomial binary search 
along the lines of Section 4.1 completes the proof. 
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5 Clustering into Stars 

In this section we discuss the im-rooted and rooted fc-star cover problem. Here, 
we are given an undirected complete graph G = (V,E), a metric c on the edges 
of E and a parameter fc > 0. In the rooted version, we are also given a set of 
root nodes R with \R\ = k. For a set S' C 1/ and a vertex r £ V, let c(S, r) be 
the cost of the star rooted at r and spanning S, i.e. c(S, r) = 

In the un-rooted version, we are supposed to partition the vertex set into k 
subsets Si, . . . , Sfc and place k roots ri, . . . , such that maxi<i<fc c(Si, Vi) is as 
small as possible. In the rooted version, ri, ... ,rk are given. 



5.1 Un-rooted fc-star Cover 

We use linear programming techniques similar to those used in facility location 
problems [4,14] to solve this problem. We first give a natural integer program- 
ming formulation. We have a variable j/i for each i G V that indicates the number 
of stars that are rooted at node i. For each pair of nodes i and j, variable Xij 
has value one if node j is in a star rooted at node i and zero otherwise. As in the 
previous section, we guess the maximum star cost B of an optimum solution, 
and try to minimize the number of stars needed to cover the graph. 



min ^ yi 




{IPs) 


i£V 






S.t ^ Xij > 1 


VjeU 


(1) 








VI 


\/i,j G V 


(2) 


^ ^ Xjj Cij Si B ■ yi 


Vi ev 


(3) 


jev 






Xij e {0,1}, 2/i e N 


Vi,j G V 


(4) 



Constraints (1) ensure that each node j £ V is assigned to a root; (2) makes 
sure that node i is marked as a root if a node j is assigned to it; (3) bounds the 
cost of each of the star rooted at node i hy B; and finally (4) sets integrality 
constraints. We denote the linear programming relaxation of (IPb) by (LP). Let 
(x, y) be a solution to (LP). The following observation is immediate. 

Lemma 5. Suppose there exists a solution to the un-rooted k-star cover problem 
with value B. Then (LPb) has a solution (x,y) such that 

Our algorithm now rounds an optimal fractional solution (x, y) of {LPb) to 
an integer solution (x, j/'), such that the total cost of the solution is no more than 
4i? and y[ < 4/c. The algorithm begins with the process of filtering, where a 
new fractional solution {x,y) which costs not much more than (x, y) but has the 
property that x^ is positive only for (i,j) which are close together. This allows 
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us to re-assign vertices to roots as we round up y to an integer solution and set 
some y variables to zero. The process and a proof of its performance guarantee 
are described in detail in the next lemma. 

Lemma 6. Suppose that (LPb) has a solution (x,y) such that < k. 

Then we can find in polynomial time a 4k-star cover of cost at most 4B. 

Proof. Define the fractional assignment cost cj as cj = for each 

j G V. Now, define 

f minjl, 2a:i,} : if ca < 2c,- 

^ I 0 : otherwise 

and tji = 2 ■ yi for all i,j G V. It is not hard to see that (x,y) is a feasible 
solution for {LPb), and scaling y, we can assume w.l.o.g. that 

^i^y Vi = 2fc. 

We now define 0-1 variables Xij as follows. Let C denote the set of “unas- 
signed” nodes and R the set of “opened” roots. We initialize C <— U and i? <— 0. 
As long as C is non-empty, pick v € C that attains the minimum c„ value among 
all nodes remaining in C, i.e. pick v such that Cy = min^gc' Cu- 

We add v to R. For node v, let Py be the set of roots that v connects to, i.e. 
Py = {i £V : > 0} and let Afy be the set of nodes in C that are served by 

roots in Py, that is. My = {j G C : 3i & Py s.t. x^- > 0}. We now assign all 
nodes in My to v, i.e. for all j G My, let 

. _ r 1 : if l = v 

{ 0 : otherwise. 

Finally, we remove My from C and continue, until C = 0. 

For every root i £ R, let iji = X){j-Sy=i} 2cj/S. If i ^ i?, then set i/i = 0. 
We now claim that: (i) ^ - iji < 2k. (ii) The cost of the star rooted at u G i? is 
bounded by 2B ■ ijy. 

Firstly, (i) follows because of the following inequalities: 

Now, observe that the reason for assigning node j to the root v must have 
been a (fractional solution) root i that serves both v and j, i.e. Xij > 0 and 
Xiy > 0. But this means that cjy < Cij + Ciy < 2c j + 2cy < 4c j where the last 
inequality follows from our choice of v. 

This proves (ii), since the cost of the star rooted at v satisfies: 

^ ^ ^ ^ ^ 4c j — 2B • ijy . 

{j-.x„j=l} {j:x„j=l} 

We now show that |i?| < 2k. Consider any two vertices u,v G R. By definition 
of our procedure of adding vertices to R, we must have PyC] Py = ij). However, 
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1 means > 1 for all v € R. Since we also have 

we must have \R\ < 2k. 

Finally, we round the yi variables to integer variables defined as y[ = \yi\. 
Since we have |i?| < 2fc and jji < 2k, we obtain ^ ■ ?/■ < 4fc. We open y[ star 
centers at node i, thus creating at most 4fc stars in total. 

Using the fact that in (IPb) we only had Xij > 0 for those (i,j) pairs where 
Cij < B, it follows that whenever y' > 1, we can assign the nodes served by i 
to distinct stars centered at node i such that each star has total cost no more 
than 4B. This is because (i) we must have Cy < B for all v G V, meaning that 
Xiv > 0 only if dy < 2B, and (ii) we have already guaranteed that the cost of 
the star rooted at v in the solution given by y is bounded above by 2B ■ {jy. 

Thus we finish with an integral solution y' and an integral assignment of 
nodes to stars given by x, such that there are no more than 4fc stars in total and 
each star has cost no more than 4B. 

Lemmas 5 and 6 yield the following theorem. This can also be converted into 
a strongly polynomial algorithm by applying the methods used in Section 4.1. 

Theorem 5. There is a polynomial-time algorithm for the un-rooted k-star cover 
problem that partitions the set of nodes into 4k stars each of which has cost 
bounded by 4B, where B is the value of an optimum solution. 

5.2 Rooted fc-star Cover 

In the rooted version of fc-star cover we are given a root set R of cardinality fc in 
addition to the usual problem parameters and we are supposed to use the roots in 
R. Notice that this problem is equivalent to the following scheduling problem: We 
are given fc machines Mi, . . . , Mk (one for each root in R) and n jobs { 

For each job-machine pair (Mi, Jj) we have a processing time c^. The objective 
now is to assign each job to a unique machine. Let Ji be the jobs assigned to 
machine Mi. We want to minimize the make-span, i.e. maxi<i<fc dj. It is 

easy to see that this problem is equivalent to the rooted fc-star packing problem. 
A 2-approximation due to Shmoys and Tardos [15] implies the following theorem: 

Theorem 6. There is a polynomial-time 2 -approximation for the rooted k-star 
cover problem. 

6 Open Questions 

The more obvious integer programming formulation for fc-star cover would min- 
imize B subject to opening no more than fc roots. However, we were unable 
to prove constant factor upper bounds on the integrality gap of that formula- 
tion. It would be interesting to see if that formulation, or some other technique, 
yields a constant factor approximation algorithm for fc-star cover which obeys 
the constraint that no more than fc stars are used exactly. 
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Our algorithms for fc-tree cover immediately yield constant factor approxi- 
mation algorithms for the “Nurse station location” problem (which one might 
call the fe-tour cover problem) that motivated this research. However, it may be 
possible to obtain improved approximation factors by attacking the fc-tour cover 
problem directly, instead of going via fe-tree cover. 



Acknowledgments 

We would like to thank Asaf Levin for sending us a copy of [1] and for his 
improved bi-criteria approximation algorithm for the unrooted fc-star problem. 



References 

1. E. Arkin, R. Hassin and A. Levin. Approximations for minimum and min-max 
vehicle routing problems. Manuscript, 2003. 

2. Y. Bartal, M. Charikar and D. Raz. Approximating min-sum fc-clustering in metric 
spaces. In Proceedings of the 33’’^* Annual ACM Symposium on Theory of Computing, 
11 - 20 , 2001 . 

3. M. Charikar and R. Panigrahy. Clustering to minimize the sum of cluster diameters. 
In Proeeedings of the 33”^* Annual ACM Symposium on Theory of Computing, 1-10, 
2001 . 

4. F. Chudak and D. Shmoys. Improved approximation algorithms for a capacitated 
facility location problem. In Proceedings of the 10*^ Annual ACM-SIAM Symposium 
on Discrete Algorithms, 875-876, 1999. 

5. R. Diestel. Graph Theory, Springer- Verlag, Berlin, 2000. 

6. M. Dyer and A. Frieze. A simple heuristic for the p-center problem. Operations 
Research Letters, 3(6):285-288, 1985. 

7. J. Fakcharoenphol, C. Harrelson and S. Rao. The fc-traveling repairman problem. 
In Proceedings of the 10*^ Annual ACM-SIAM Symposium on Diserete Algorithms, 
655-664, 2003. 

8. G.N. Frederickson, M.S. Hecht and C.E. Kim. Approximation algorithms for some 
routing problems. SIAM J. Computing 7:178-193, 1978. 

9. M. Carey and D. Johnson. Computers and Intractability: A Guide to the Theory of 
NP-Completeness, W.H. Freeman, San Francisco, 1979. 

10. N. Guttman-Beck and R. Hassin. Approximation algorithms for min-sum p- 
clustering. Discrete Applied Mathematics, 89:125-142, 1998. 

11. M. Haimovich, A. Rinnooy Kan and L. Stougie. Vehicle Routing: Methods and 
Studies, Elsevier, 1988. 

12. D. Hochbaum and D. Shmoys. A best possible approximation algorithm for the 
fc-center problem. Mathematics of Operations Research, 10(2):180-184, 1985. 

13. A. Levin, private communication. May 2003. 

14. D. Shmoys, E. Tardos and K. Aardal. Approximation algorithms for facility lo- 
cation problems. In Proceedings of the 29*^ Annual ACM Symposium on Theory of 
Computing, 265-274, 1997. 

15. D. Shmoys and E. Tardos. An approximation algorithm for the generalized assign- 
ment problem. Mathematical Programming A, 62:461-474, 1993. 

16. P. Toth and D. Vigo (editors). The Vehiele Routing Problem, SIAM monographs 
on discrete mathematics and applications, 2002. 



An Improved Decomposition Theorem for 
Graphs Excluding a Fixed Minor 



Jittat Fakcharoenphol^* and Kunal Talwar^** 

^ Kasetsart University, 

Bangkok, Thailand. 
jtfOku. ac . th 

^ Computer Science Division, 
University of California, Berkeley 
kunalScs . berkeley . edu 



Abstract. Given a graph G and a parameter 5, we want to decompose 
the graph into clnsters of diameter 5 without cutting too many edges. 
For any graph that excludes a Kr,r minor, Klein, Plotkin and Rao [15] 
showed that this can be done while cutting only 0{r^/S) fraction of the 
edges. This implies a bound on multicommodity max-flow min-cut ratio 
for such graphs. This result as well as the decomposition theorem have 
found numerous applications to approximation algorithms and metric 
embeddings for such graphs. 

In this paper, we improve the above decomposition results from 0{r^) 
to 0{r^). This shows that for graphs excluding any minor of size r, 
the multicommodity max- flow min-cut ratio is at most 0{r^) (for the 
uniform demand case). This also improves the performance guarantees 
of several applications of the decomposition theorem. 



1 Introduction 

A natural generalization of the s-t flow problem is the multicommodity flow 
problem, where we want to simultaneously route several commodities. Each com- 
modity has a source and a sink, and the goal is to route the flows so that the 
total flow on any edge does not exceed its capacity. An optimization version 
of this problem is the concurrent flow problem, first defined by Shahrokhi and 
Matula [32], where we wish to maximize the throughput A, such that we can 
feasibly route a A fraction of each demand. 

The sparsity of a cut {S, S) is defined as c(S', S)/d{S, S), where c{S, S) is the 
sum of capacities of edges between S' to ^ and d{S, S) is the total demand from 
some source(sink) in S to a sink(source) in S. The sparsity of any cut gives a 
upper bound on the maximum throughput. For the single commodity case, the 
max-flow min-cut theorem of Ford and Fulkerson [9] and of Elias, Feinstein and 
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Shannon [8], says that the maximum flow equals the value of the sparsest cut, 
and also gives an algorithm for finding the minimum cut. 

The seminal work of Leighton and Rao [18] first considered approximate 
max-flow min-cut theorems. They showed that for the case of uniform demands, 
the ratio of sparsest cut to the maximum throughput in any graph is at most 
O(logn). Their proof also gives an algorithm to And a cut of sparsity no more 
than 0(log n) times the maximum throughput (and hence at most 0(log n) 
times the sparsest cut). This approximation algorithm is a basic subroutine for 
approximation algorithms for a variety of NP-hard problems. 

For arbitrary demands, such an approximate max-flow min-cut theorem was 
discovered by Klein, Rao, Agrawal and Ravi [16], who showed an upper bound 
of 0(log Clog I?) where C is the sum of all capacities and D is the sum of 
all demands. This ratio has since been improved and the best currently known 
bound is 0(log fc), where k is the number of commodities, due to Linial, London 
and Rabinovich [19], and Aumann and Rabani [2] (see the related work section 
for details). For arbitrary graphs, this is the best(upto constants) that one can 
do, since an expander graph gives a matching lower bound. 

Klein, Plotkin and Rao [15] considered restricted families of graphs, and 
showed for graphs excluding a minor of size r, the gap is O(r^) for the uniform 
demand case and 0{r^ log k) for the general case. The latter result was improved 
to 0{r^ ^/log k) by Rao [26]. In particular, this showed that for planar graphs, 
which exclude and K3 3 minors, the max-flow min-cut gap is 0(1) for the uni- 
form case. Both the aforementioned results use a decomposition lemma proved 
in [15], which says that given a parameter <5, one can decompose a graph exclud- 
ing a Kr,r minor into clusters of diameter S, while cutting only 0{r^ / 5) fraction 
of the edges^. Note that any such decomposition of a path graph must cut an 
0(y) fraction of the edges, and thus the overhead for graphs excluding Kr^r was 
shown to be 0{r^). Not surprisingly, this decomposition lemma has found sev- 
eral other applications to approximation algorithms, distributed computing and 
embeddings results for such graphs. 

In this paper we make some progress towards finding the right relation be- 
tween the size of the forbidden minor and the overhead of such a decomposition. 
We show that for any graph excluding a minor, we can And a decomposition 
into clusters of diameter <5 while cutting only 0{r^ /S) fraction of the edges. This 
shows that the max-flow min-cut gap for such graphs is 0{r^) for the uniform 
demands case and 0(r^-\/log n) for the general case. It also improves the perfor- 
mance guarantees of approximation algorithms and embeddings results for such 
graphs. 

What is the right order of magnitude of the overhead of such a decomposi- 
tion? An expander graph gives a lower bound of 12 (log r), the upper bound we 
show is O(r^). Moreover, can we bound this overhead in terms of some other 
topological/metric properties of the graph? We leave open these intriguing ques- 
tions. 

^ The second result actually requires the decomposition to have an additional 
“padding” property, details of which are deferred to the technical sections. 
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Related Work 

As described above, Klein et.al. [16] gave the first non trivial upper bound 
of O(logClog-D) for multicommodity max-flow min-cut ratio for arbitrary de- 
mands. This was improved to 0(logfc*) through the works of Tragoudas [34], 
Garg, Vazirani and Yannakakis [11], Plotkin and Tardos [23], Aumann and Ra- 
bani [2], Linial, London and Rabinovich [19], and Giinllik [12] {k* here is the 
size of the smallest vertex cover of the demand graph). 

For several special classes of graphs, exact max-flow min-cut theorems have 
been proved, for example, by Hu [14], Rothschild and Whinston [29], Dinits(see 
[1]), Seymour [31], Lomonosov [20], Seymour [30] and Okamura and Seymour [22]. 
See [10] for more on this vein of work. 

Network decomposition theorems like this one, are known for other classes of 
graphs as well. For general graphs, it is known that it suffices to cut an 0(log n/S) 
fraction of the edges to decompose it into clusters of diameter S, and this is the 
best one can do for general graphs. For graphs induced by real normed spaces 
Rp, Gharikar et.al. [7] show that such decompositions exist with an overhead of 

0{d,p) for 1 < p < 2 and 0(d^~p) for p > 2, and that this is tight. 

The characterization of planar graphs in terms of forbidden minors is due to 
Kuratowski [17]. Robertson and Seymour [28] showed that similar charcteriza- 
tions exist for graphs of genus g for any g. In particular it is known that graphs 
of genus g exclude minor. 

The approximate max-flow min-cut theorems have found numerous appli- 
cations such as Oblivious routing. Data management, small area VLSI layout, 
efficient simulations of one interconnection network by another, etc. For more de- 
tails on oblivious routing the reader is referred to the papers by Racke [25] , Azar 
et.al. [3], Bienkowski, Korzeniowski and Racke [5], and Harrelson, Hildrum and 
Rao [13]. Data management applications have been looked at by Maggs et.al. [21]. 
The reader is referred to Bhatt and Leighton [4] for VLSI layout applications. 

The decomposition theorem itself has found applications to approximation 
algorithms for various NP-hard problems. We mention a few of these applications 
here. Tardos and Vazirani [33] showed that the decomposition theorem implied 
an O(r^) bound on the max (total) flow-min multicut gap and an approximation 
algorithm for minimum multicut in graphs excluding a minor. 

Rao and Richa [27] gave loglogn)-approximation algorithms for mini- 
mum linear arrangement and minimum containing interval graph on graphs ex- 
cluding Kr minor. Galinescu, Karloff and Rabani [6] gave an 0(r^)-approximation 
algorithm for the 0-extension problem on such graphs and Feige and Krauthgamer 
gave an 0{r^ log n)-approximation algorithm to minimum bisection on such 
graphs. 

A slight modification of these decompositions have also been used in the area 
of metric embeddings. Rao [26] showed that graphs excluding K^. minors can 
be embedded into I 2 with distortion 0{r^ y/log n). Moreover these embeddings 
preserve not only distances but also volumes. Recently, Rabinovich [24] showed 
how to embed a metric excluding into a line with average distortion O(r^). 
For graphs with tree width r, they further improved the embedding to 0(log r) 
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and left open the question of the correct order for graphs excluding minor. 
Our results improve the in all the above applications to r^. 



A Note on Techniques 

The techniques used in this paper borrow generously from those used by Klein, 
Plotkin and Rao [15]. They showed that if their algorithm of repeatedly shatter- 
ing BFS trees 0{r) times produced a cluster of large diameter, then they could 
construct a minor, consisting of r well spaced points in the large diameter 
cluster and the r roots of the BFS trees. We note that the roots of the BFS trees 
used were chosen arbitrarily. 

Instead, we are somewhat more careful in our choice of the roots. We make 
sure that the roots of the BFS trees constructed are mutually far apart; this 
allows us to construct disjoint paths connecting these roots. This allows us to 
get a better guarantee on the diameter of the clusters. 

2 Preliminaries 

Let H and G be graphs. Suppose that for every vertex v of H, G contains a 
connected subgraph A{v) and for every edge (u,v) in H, there is an edge £(uv) 
connecting A{u) and A(z;) in G. If the A(w)’s are pairwise disjoint, we say that G 
contains an H -minor and call U^A(z;) an H -minor of G. We refer to the A(i;)’s 
as supernodes and £(rtz;)’s as superedges. 

We denote by Kh the complete graph on h nodes. Note that if G contains a 
Kh minor, it contains every minor on h vertices. Thus if G excludes any minor 
of size h, it excludes Kh- In particular, excluding a minor implies excluding 
a K 2 r minor. Moreover, a Kr^r contains a minor. Thus upto a factor of 2, 
excluding a minor and excluding a minor are equivalent. 

Given a graph G = (K, E), we can define a natural distance measure on V: 
dc{u,v) is the length of the shortest path from u to v. For a subset V of K, 
the weak diameter of V is defined to be {dciu, w)}. In this paper, the 

term diameter will always refer to weak diameter. 

A (5-decomposition tt of G = (V,E) is a partition of V into subsets Vi,V 2 , ■ ■ ■ , 
Vk such that each cluster Vi (defined as {z; G K : tt{v) = *}) has (weak) diameter 
at most (5. An edge e = (tt, z;) is said to be cut by this decomposition if u and v 
lie in different I^’s. 

Let 7T be a set of (5-decompositions of G and let T> he a distribution over 77. 
We say (77,7?) is a-padded if for any vertex z;, and any c < the probability 
that V is at distance less than c5 from any cluster boundary is at most 2cof. More 
formally, for a partition tt, let d{v, tt) = min^(.,r(?i) 5 < 47 r(j;) d{u, v). Then we say that 
{n,T>) is a-padded if Pr,re( 77 ,-D) [d('L', t^") < c<5] < 2ca. A probabilisitic version of 
the KPR decomposition was shown to be 0(r^ )-padded in [26]. We shall show 
that our decomposition is 0(r^)-padded. 

For ease of notation in the rest of the paper, we shall give an algorithm to 
construct an 0(?’(5)-decomposition of the graph, which cuts 0{r/S) fraction of 
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the edges, and is 0(r)-padded. The result claimed in the introduction can of 
course be derived by scaling <5 by a factor of 0{r). 

3 The Decomposition Procedure 

We decompose the graph recursively r — 2 times. At each level i, given a cluster 
Gi, we do the following. We pick, if possible, an appropriate node (explained in 
the next paragraph) in Gi and construct a breadth first search tree rooted at 
Qi- We say a vertex v is at level I if its distance in Gi, from is 1. We partition 
the edges of Gi into S classes. For fc = 0, 1, . . . , (5 — 1, the class consists of 
edges between nodes at level jS + k and jS + k + 1 for some integer j > 0. We 
pick an integer fcs{0,...,(5 — 1} uniformly at random, and cut the edges in the 
class. We recurse on the resulting clusters. 

By appropriate above, we mean a node which is at least distance Ar5 far from 
each of roots of the breadth-first search trees in the higher levels of recursion. 
In case there is no such node in cluster Gi , we shatter the cluster in a different 
way - each cluster consisting of vertices close to one of the previous level roots. 

Finally, we further shatter each resulting cluster Gr-i into at most r — 1 
pieces by cutting out clusters of inappropriate nodes; for each of the centers 
fli, . . . , Or- 2 , we cut out a set of vertices close to Ui to form a separate cluster. We 
redefine Gr-i to be the remaining set of nodes C". The above procedures describe 
the set of edges that are cut; the final clusters are defined by the connected 
components of the remaining graph. 

Figure 1 show the pseudocode of the procedures. We start by calling the 
procedure Decompose(Gi = G, 1, {}). 

4 Proof of the Decomposition Procedure 

We first show that the decomposition constructed has the two properties that 
we needed. 

Lemma 1. The expected number of edqes that are cut by the above procedure is 
0{r\E{G)\/5). 

Proof. Note that we have at most r levels of recursion, and at most r cuts 
made in any shatter procedure. Thus at most 2r cuts potentially involve any 
particular edge. In each call to decompose or shatter, a fixed edge in the cluster 
has a probability at most l/<5 of being cut (since it is at exactly one level, and 
we choose one of <5 levels u.a.r.). Thus, any fixed edge has a probability at most 
2r/5 of being cut. The claim follows by linearity of expectation. 

Lemma 2. The decomposition produced is 2r-padded. 

Proof. From the argument above, each cluster is produced as a result of at most 
2r random cuts. Fix a vertex v and let h) be a random variable denoting its 
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Algorithm Decompose(Gi, = {oi, . . . , ai_i}) 

1. if there exists v G Gi such that dc{aj,v) > ArS for all 1 < j < i — 1 then 

1.1 fli <— V. 

1.2 Create a BFS tree % in Gi rooted at ai. 

1.3 if 7i contains less than d + 1 level then 

1.3.1 stop. 

1.4 for fc = 0, 1, . . . , 5 — 1 do 

1.4.1 Define the fc-th cut S* to be the set of edges between nodes 

at level jS + k and jS + k + 1 in %, for some j > 0. 

1.5 Pick a k randomly in 0, 1, . . . , <5 — 1. Let S = Sk- 

1.6 Cut all edges in S. 

1.7 for each component G' in Gi — S do 

1.7.1 if z < r — 2 then 

1.7. 1.1 Decompose(G', z + 1, {ai, . . . , ai_i, Oi}). 

1.7.2 else 

1.7. 2.1 Shatter(G', z, {ai, . . . , ai-i,ai}). 

2. else 

2.1 Shatter(Gi, z — l,p). 



Procedure Shatter(G, k,p = {ai, . . . , az;}) 

1. G' ^ G. 

2. for z = 1, . . . ,k do 

2.1 Gi <— all nodes v in C' such that da{v,ai) < ArS. 

2.2 Create a breadth-first search tree Ti from nodes in Gi. 

2.3 Let T/ be the first 5-1-1 levels of Ti. 

2.4 if T/ covers all G' then 

2.4.1 G' ^ 0. 

2.5 else 

2.5.1 Let j be chosen randomly in 0, 1, . . . , 5 — 1 

2.5.2 Let T” be a subtree of T up to level j. 

2.5.2 Cut all edges at level j. 

2.5.3 G' ^G' - {Gi U T''). 



Fig. 1. The decomposition procedures. 



distance from the boundary of the cut. Clearly, d{v,Tr) = miniYi. Moreover, 
the cut was chosen uniformly at random from <5 equispaced cuts, and thus Yi 
is uniformly distributed in [1, 5/2]. Hence Pr[l/ < cS\ < 2c for any c < 1/2. The 
claim then follows by a simple union bound. 



Having established the required properties of the probabilistic decomposition, 
we now proceed to show that it is indeed an 0(r5)-decomposition. Note that 
our decomposition consists of two kinds of clusters - those consisting of vertices 
close to some root, formed by some call to procedure shatter, and those formed 
by the procedure decompose. We first show that clusters of the first type have 
small diameters. 
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Lemma 3. The procedure Shatter cuts out clusters each of weak diameter at 
most {8r + 2)5. 

Proof. For each j = 1, . . . ,i — 1, we define the set Cj to be the set of all vertices 
in Gi which are at distance at most Ar5 from aj . The procedure cuts out cluster 
T” formed by taking the set of vertices in Cj closer than some randomly chosen 
threshold t < 5 to a j . Consider any pair of nodes u and v in the same connected 
component T" in the resulting graph. It must be the case that there is some 
such that the distance from u and v to Oi is at most (4r + 1)5 in Gi. Therefore 
by triangle inequality, the weak diameter of each such component is at most 
(8r + 2)5. 

We now consider the remaining case. We wish to show that if the graph 
excludes a Kr minor, then the diameter of each such cluster resulting from 
our decomposition algorithm is small. We shall show the contrapositive - if the 
resulting decomposition has some cluster with large diameter, we shall show how 
to construct a Kr minor in the graph. Let Gr-i be a cluster of large diameter 
and let Or-i and be two vertices in Gr-i which are at least distance 4r5 
apart. We shall construct a Kr minor, containing a supernode centered at each 
Oi, for i = 1, 2, . . . , r. We shall use the paths in the bfs trees to find superedges. 

Lemma 4. Suppose that a cluster Gr-i output by our algorithm has diameter 
4r5. Then G\ contains a Kr minor. 

Proof. As above, denote by Ur-i and two nodes in Gr-i at distance 4r5 from 
each other. Note that by our construction, every pair of Oi and aj is at least 
distance 4r5 apart. 

We shall show how to construct a Kr minor in Gi. We do so by reverse 
induction - we give a procedure which, for b = r — 2,r — 3,...,1, constructs a 
iFr-b-niinor in Gb+i. 

Recall that Gi+i consists of 5 consecutive layers in the bfs tree 7) rooted at 
Qi. An ancestor-path of u in 7) is the path in 7) from v to the root Oi of 7). We 
shall construct the minor using suitable ancestor-paths in 7)’s. 

Given a Kb-minor in G such that starting at each supernode A{g) there is 
path Pg, we say that the paths {Pg} are tails if each path Pg is disjoint from 
the other paths and also from all supernodes except A{g). We shall refer to Pg’s 
ending node (outside of A{g)) as the tip of the tail Pg and denote it by tip{Pg). 

Klein, Plotkin and Rao [15] show how to construct the minor inductively by 
also constructing tails which are ancestor-paths of 7), and special nodes (which 
they called middle nodes) on the tails which are far apart, and using them to 
further construct disjoint components of the minor. We use a similar approach. 

We shall construct a 7iTr_f,-minor in Gb+i. In addition, we construct r — 6 + 1 
tails {Pi} which are ancestor-paths of % of length exactly 45 such that for each 
tail Pi, a middle node hi of Pi is at distance 465 from the other middle nodes 
hj’s. Moreover, we require that every middle node is at distance at least 465 
from the root at, of 7),. This shall be our (reverse) inductive claim. 
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For the basis step, when b = r — 2, let P be the shortest path from Ur-i to Gr 
in Gr-i (since Gr-i is connected, such a path exists). We construct a iF 2 -minor 
from the path P. We let .4(0^) be a path of length 46—1 on P starting from 
Gr- The other supernode ^(a^-i) is then P — A{Gr). We construct the tails by 
taking Pj to be the ancestor-paths in %.-2 oflength 4<5 from Gj,lorj G {r,r — l}. 
It can be checked that they are proper tails and the middle nodes in these tails 
are at distance at least 4(r — 2)^ from each other. Also the middle nodes in these 
tails are at distance at least 4(r — 2)6 from 0^-2- 




Gi 



Fig. 2. The inductive step. 



We now show the inductive step. Assuming that the claim is true for b = i+1, 
we want to show that the claim is true for b = i, i.e., G^+i contains ATr-i as a 
minor and a new set of tails with the required properties. 

We first construct the minor. For j > f-l-1, we create supernodes A' ( gj ) from 
the supernodes of Kr-i-i as follows. We let A'{Gj) be A(a j) U {Pj - {tip{Pj)}). 
From the inductive assumption, these supernodes are disjoint. This gives us 



44 



Jittat Fakcharoenphol and Kunal Talwar 



r — i — 1 supernodes. We let A'{ai+i) be a union of all ancestor-paths in 
starting from the tip of all the tails {Pj}. 

We must show that A! is disjoint from all other new supernodes. Since 
we create tails of length 46 from the ancestor-paths in %+ 2 , the end nodes of the 
tails lie outside the subgraph Gi+ 2 ; therefore, the supernodes A{aj), lying inside 
Gi+ 2 , and are disjoint. Also, the tails {Pj} and A'{ai+i) are disjoint 

by construction. Moreover, the last edges on the paths Pj give us the required 
additional superedges. This shows that contains a ATr-i-minor. 

To finish the inductive claim, we need to construct the tails with the desired 
properties. For each middle node hj, let the tail Pj be the ancestor-paths in 

from hj of length 46. These tails are mutually disjoint because hj’s are at 
distance at least 4(i -|- 1)^ from each other in Gi. We also create another tail P/ 
starting from in the same way. It is straightforward to verify that the new 
middle nodes {h}} are at the right distance of each other. 

We must also show that the tail Pj are disjoint from all A'{ak) where k ^ j. 
Consider any node v in A'{ak). From the choice of hj, the levels of v and hj in 
7i_|_i differ by more than 6. This implies that v does not lie on the ancestor-paths 
of hj in for any j'^ (since G^+i consists of at most 6 consecutive layers of 7j, 
there is a path of length at most 6 from hj to any 7j-ancestor (say w) of hj lying 
in Gi+i. Thus hj and w would be within <5 layers of each other in Tj+i and hence 
V is different from w). Thus for any j yf k, Pj is disjoint from M'(afc). To show 
that Pj does not cross any tails Pfc, we note that the distance between hj and 
hk is more than 66. Finally, since is at distance 4(i + 1)<5 from all the middle 
nodes hj, the path Pj is also a proper tail. 

It only remains to show that the middles node /ij’s are at distance at least 
4i6 from a^-i. From our construction a^-i is at distance at least 4r6 from Oj, 
where j > i. We know inductively that the new middle node h} are at distance 
at most 2(r — i)6 from aj. By triangle inequality then, the distance from h} and 
Qi-i is at least 4r6 — 2(r — i)6 > 4i6. This completes the inductive argument. 

Thus, when 6=1, the induction claim says that G 2 contains a Kr-i-Tainor 
and the tails with the appropriate properties. We can construct a ATr-minor in 
Gi as in the inductive step. This completes the proof of Lemma 4. 

From the above lemmas, we have the main theorem. 

Theorem 1. Given a graph G and parameters 6 and r, we can either find a 
Kr minor in G or find a 0(r) -padded 0(r 6) -probabilistic decomposition of the 
G which expects to cut at most Ofmr/6) edges. 

We can also generalize this procedure for graphs with distances and weights 
on the edges. Moreover, if the padding property is not required, we can easily 
derandomize the algorithm by picking the best cut at each step. 



^ This is exactly the “moat” argument in [15]. 
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Abstract. We study two packing problems that arise in the area of 
dissemination-based information systems; a second theme is the study 
of distributed approximation algorithms. The problems considered have 
the property that the space occupied by a collection of objects together 
could be significantly less than the sum of the sizes of the individual 
objects. In the Channel Allocation Problem, there are users who request 
subsets of items. There are a fixed number of channels that can carry 
an arbitrary amount of information. Each user must get all of the re- 
quested items from one channel, i.e., all the data items of each request 
must be broadcast on some channel. The load on any channel is the 
number of items that are broadcast on that channel; the objective is to 
minimize the maximum load on any channel. We present approximation 
algorithms for this problem and also show that the problem is MAX-SNP 
hard. The second problem is the Edge Partitioning Problem addressed 
by Goldschmidt, Hochbaum, Levin, and Olinick {Networks, 4^-13-23, 
2003). Each channel here can deliver information to at most k users, 
and we aim to minimize the total load on all channels. We present an 
)-approximation algorithm and also show that the algorithm can 
be made fully distributed with the same approximation guarantee; we 
also generalize to the case of hypergraphs. 



1 Introduction 

We develop approximation algorithms for certain packing problems arising in 
broadcast systems; these have the property that the objects to be packed “over- 
lap” . In other words, the space occupied by a collection of objects together could 
be significantly less than the sum of the sizes of the individual objects. This is 
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in contrast with traditional packing problems in which the objects to be packed 
are disjoint. A second theme of our work is that some of our algorithms can 
also be made completely distributed and implemented to run in polylogarith- 
mic time, with only a constant-factor loss in the approximation guarantee. We 
study problems that arise in the area of dissemination-based information sys- 
tems [1,2,11,12,23]. Such systems are used in application domains such as public- 
safety systems, election-result servers and stock tickers [3]. One characteristic of 
dissemination-based applications is that there is a high degree of overlap in the 
user needs. Since many user-requests in such applications are similar, it would 
be a waste of resources to transmit the information to each user separately. For 
users with similar requests, if their requests are grouped and transmitted only 
once then this wastage of bandwidth could be avoided. On the negative side, the 
grouped data may contain information that would be irrelevant for some users. 
Hence, the users would have to process the broadcast information to obtain the 
data that they want. Thus, there is a trade-off between reducing the bandwidth 
used by grouping the requests and the amount of processing of the broadcast 
data that the clients need to do to obtain the data that they requested. In our 
model, there is a transmitter such as a satellite that broadcasts information on 
a fixed number of physical multicast channels. Each user is assigned to some 
channel on which the user gets his/her requested data. Our work deals with 
satisfying the client requests in a timely manner, while minimizing the amount 
of bandwidth used. 

Problems and Results. The first problem, Channel Allocation, can be defined 
as follows. There is a set of topics (e.g., news, sports events, stock-market up- 
dates), as well as a set of users. Each user requests a subset of items (topics). 
There are a fixed number of channels that can each carry an arbitrary amount 
of information. Each user must get all of the requested items from one channel, 
i.e., all the data items of each request must be broadcast on some channel. The 
load on any channel is the number of items that are broadcast on that channel, 
and the goal is to minimize the maximum load on any channel. Formally, we 
are given: (i) a set of topics T = ■ ■ ■ , tn}, (ii) a collection of user-requests 

R = {i?i, i? 2 , . • . , Rm}, where Ri Q T for all i, and max^ |i?i| is a constant w; 
and (iii) a positive integer k denoting the number of channels. Our goal is to 
construct a family C = {Ci, C 2 , . . . , Cfe}, Q C T, such that for each set Ri € R, 
there exists a Cj such that Ri C Cj. For all j, Cj constitutes the set of topics 
on channel j. If Ri C Cj then we say that request Ri is satisfied by channel j. 
The load on channel j is the number of topics placed on it: i.e., |Cj|. The objec- 
tive function is to minimize the maximum load on any channel, i.e., to minimize 
maxj \Cj\. We will denote this problem as CHA. 

The second problem, Edge-Partitioning (EP), basically arises by bounding 
the number of requests that any channel can handle, in CHA. The setting is 
the same as in CHA, with the additional constraint that each Ri must be as- 
signed to some channel Cj for which Ri C Cj holds; furthermore, the number 
of requests (i.e., users) assigned to a channel should be at most k. Subject to 
these constraints, the objective is to minimize J^j \^j\- This problem was stud- 
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ied by Goldschmidt et al. [14] for the special case of tc = 2, in the context of 
optical network design. (That is, given a graph G, we seek to cover the edges by 
subgraphs containing at most k edges each, and we aim to minimize the total 
number of vertices in the chosen subgraphs.) The work of [14] considers the case 
w = 2, and presents an 0(A/fc)-approximation algorithm. 

We give an 0{n ™+i (Ig n) ™ )-approximation algorithm for CHA; this is ob- 
tained by taking the better of a random construction and the output of a suit- 
able set-cover problem. We also show that the problem is MAX-SNP hard for 
all ic > 4; thus, a polynomial time approximation scheme for the problem would 
imply that P = NP. For the case w = 2, CHA is the following graph problem: 
cover all the edges of a given graph by a given number of subgraphs, minimiz- 
ing the maximum number of vertices in these subgraphs. Here, we obtain an 
0(n^/^“'^)-approximation algorithm for some positive constant e. We also show 
that the problem is NP-hard for w = 2, even when there are only two channels. 

For EP, we obtain an 0{w ■ n™+i )-approximation algorithm, by taking the 
better of a simple approach and a greedy algorithm. Recall that an 0{Vk)- 
approximation algorithm was developed in [14] for the case w = 2; in this case, 
our bound of is incomparable with 0{Vk) (note that k can take on 

values from 1 up to m, the number of edges in the graph). We then present 
an alternative approach with the same approximation guarantee for the case 
w = 2, with the help of certain tail bounds for sums of correlated random 
variables [17,18,22]. We show that this can be implemented as a polylogarithmic 
time, distributed algorithm, where each arriving user only communicates with 
the servers handling the topics that the user is interested in. This brings us to 
the next main theme of this paper: that of distributed approximation algorithms. 
Given the emergence of various contexts where distributed agents (e.g., in the 
Internet) make decisions using only local information, it is natural to ask whether 
the notion of approximation algorithms can be brought to bear fruitfully in such 
contexts. Not many polylogarithmic-time distributed approximation algorithms 
are known: the few that we are aware of include [15,19,9]. We hope that the 
intriguing mix of approximation and the constraint of locality will be understood 
further by research in distributed approximation algorithms. 

Related Work. A problem related to the ones we study is the well-known 
Dense fc-Subgraph problem (DkS) : given a graph G, select a subset of k vertices 
whose induced subgraph has the maximum number of edges. In the language of 
GHA, we have w = 2 and one channel with capacity fe; we wish to satisfy the 
maximum number of user requests. This problem is NP-hard, and an 0(n“)- 
approximate solution for some a < | was given by Feige et al. [10]. The problem 
is not even known to be MAX-SNP hard. Also, Daskin et al. [8] discuss the 
following related printed circuit board (PCB) assembly problem. In this problem 
we have a list of PGBs and a list of different component types required by each 
PGB. The machine that produces the PGBs can hold only a fixed number of 
different component types, and can be loaded any number of times. The goal 
here is to minimize the sum over all component types, of the number of times 
each component type is loaded. The users correspond to the PGBs, the items 
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correspond to the different component types required by a PCB and the channel 
corresponds to the machine. In other words, the channel capacity is fixed, any 
number of channels could be used and the objective is to minimize the sum 
of the channel loads. They show that the problem is NP-hard. For the general 
version of the problem in which each component type (item) and PCB (user) 
is associated with a cost, they provide a heuristic solution. They also provide a 
branch-and-bound algorithm that can optimally solve small to moderate sized 
instances of the problem. 

Due to the lack of space, many proofs are deferred to the full version. 

2 The Channel Allocation Problem 

2.1 Algorithm 

Our approach employs two different algorithms and chooses a solution of lower 
cost from the two solutions obtained. As we will see, these two algorithms per- 
form “well” on different sets of inputs that cover the entire spectrum of inputs. 

The first algorithm is the following simple randomized algorithm. Indepen- 
dently place each topic on each channel, *, 1 < * < fc, with a probability p which 
will be determined later. We will show that with a sufficiently high probabil- 
ity we obtain a feasible solution whose cost is close to its expected cost. This 
probability can be boosted by repeating the random process. 

The second algorithm uses the greedy set cover algorithm [6,20,21] on the set 
cover instance, /, that is constructed as follows. The elements of the instance, /, 
are the requests in R. Let t be some fixed large constant. For all z, 1 < z < 
consider all (™) combinations of z elements. For each combination, Z, let Sz be 
the set of requests corresponding to the elements in Z and let be the topics 
obtained by taking the union of the requests in Sz- The combination Z forms 
a set in / iff \Tz\ < t. The size of our set cover instance, |/| = — 

EUi = 0{\T\^) = 0{n^) = 0(m‘). Let M = maxs,67{|5,|} = 0(P") be 
the size of the largest set in I. Since t and w are constants, |/| is polynomially 
bounded and M is a constant. Now we use the greedy set cover algorithm on I 
to obtain a set cover for R. For each set Sz chosen by the set cover algorithm 
we create a new channel. The topics in Tz constitute this channel and hence the 
requests in Sz are satisfied by this channel. The set cover covers all requests in 
R. This solution may be infeasible as it may use more than k channels. By using 
Lemma 1 we can convert it into a feasible solution using k channels. 

We will now analyze our algorithm. Note that we can obtain solutions with 
good approximation guarantees trivially for the following values of w and k. If 
zc = I we can get an optimal solution of cost \m/k~\. If k < 21nm, we get a 
21nm approximation guarantee, since for any k we can obtain a fc-approximate 
solution by placing all topics on each of the k channels. If fc > (i^)*", we can 
partition the requests into groups of size [(Inn)’"] and place each group on a 
separate channel. This is a feasible solution as there are at most requests. 
The cost of our solution is at most 0(zc(ln n)*"), thus giving an approximation 
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guarantee of 0((lnn)“'). For the rest of the analysis we will assume that w > 2 
and 21nm <k< 

Let {X, y) solution to CHA denote allocating y channels such that the load on 
each of the channels is at most X . 

Lemma 1. If {L,k'), where k' > k, is a solution to CHA then there exists a 
feasible solution fc). 

Lemma 2. With a sufficientlu hiqh probability, the randomized alqorithm qives 
a (0(n(i^)i),fc) solution. 

Lemma 3. The set cover approach gives a (0{{LopT)'“),k) solution. 

Theorem 1. There is a polynomial-time algorithm for CHA that gives an 
0{n {lgn)^)~ approximate solution. 

2.2 Hardness of Approximation 

We will prove that CHA is MAX-SNP hard via a reduction from 3-Dimensional 
Matching problem (3DM) which can be formally stated as follows. 
3-Dimensional Matching (3DM): Given three disjoint set of elements, X,Y, 
and Z, such that |A| = |F| = \Z\ = q, and a set of triples, C, each triple 
containing one element from X, Y, and Z . The goal is to find the maximum 
number of pairwise disjoint triples. 

For clarity, we prove Theorem 2 for all w > 10. We can show that CHA is MAX- 
SNP hard for all w > 4 by replacing each request, D, of size 10 in the reduction 
by requests of size 4, where each new request is a subset of D. 

Theorem 2. Unless P = NP, for some fixed e > 0, the channel allocation 
problem is hard to approximate to within a factor of (1 -I- e). 

Proof. Let 3DM(/) denote the cost of an optimal solution to instance /. Similar 
definitions apply to CHA. 3DM is NP-complete [13]. We will prove the following. 

I g 3DM ^ CHA(/(/)) < 12 (1) 

I (f 3DM ^ CHA(/(/)) > 13 (2) 

The function / shows that an approximation algorithm for CHA yielding a 
solution of cost lower than ^OPT would imply that P=NP. Our reduction 
is inspired by the NP-hardness reduction from 3DM to Partition into 
Triangles [13]. 



Consider a 3DM instance I. For notational convenience we will drop the 
parameter I while using the symbols, for e.g., we will use C instead of C{I) to 
denote the set of triples in I. We now describe the function / that converts I 
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Fig. 1. Gadget corresponding to each triple. 



into a CHA instance, /(/), as follows. The CHA instance that we construct will 
have big requests, small requests and dummy requests. We start by describing 
the big requests. There are {3q + 91(71) big requests, one for each of the 3q 
elements in I and 9 for each triple in C. The big requests are mutually disjoint. 
Each big request, B, has 4 topics, tf , 1 < i < 4. We will now describe the 
small requests. For each triple, Cj G (7, we will construct the gadget shown 
in Figure 1. Each gadget consists of 12 big requests (mentioned earlier), 9 of 
which are unique to the gadget and the other 3 big requests corresponding 
to the elements in Cj are shared between the gadgets corresponding to the 
triples containing the elements. Each edge connecting two big requests U and 
V represents 16 small requests, {tf} U {tj}, for all combinations of i,j for 
1 < ^,J < 4. Thus each small request has size 2 and contains one topic from 
each of the two big requests. We also have (144|(7| — 48c;) dummy requests of 
size 10 each. The dummy requests are mutually disjoint and disjoint from all 
other requests. This completes our description of requests. The set of topics 
is the union of the big requests and dummy requests. The total number of 
channels is 4g + 3(|(7| - q) + 96g + 144(|(7| -q) = q + 3|(7| + 144|(7| - 48<?. 
Before we prove (1) and (2), let us define some notation. Consider a gadget rep- 
resenting a triple Cj € C. Let T/ , 1 < i < 7, denote the set of 12 corresponding 
to the big requests that form the triangle T/ as seen in Figure 1. For notational 
convenience, we will drop the superscript j. Note that Ti denotes a set of topics 
as well as a triangle. The reference will be clear from the context in which it is 
used. A channel satisfying a triangle Ti would mean that the set of topics, T^, 
is placed on the channel and hence the 3 big requests that form the vertices of 
the triangle and 48 small requests represented by the edges of the triangle are 
satisfied. 



Claim. If 3DM(/) = q then CHA(/(/)) < 12. 
Claim. If 3DM(/) < q then CHA(/(/)) > 13. 
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3 CHA Instances with Small Set-Size 

In this section we consider the case of CHA instances when user requests are of 
size at most 2. In this case the user requests can be modeled as a graph in which 
the vertices represent the topics and the edges represent the user requests, i.e., 
an edge (i,j) would represent a user requesting topics i and j. The goal is to 
allocate channels while minimizing maxi<i<fc Li. We can show: 

Theorem 3. CHA is NP-hard when each request is of size two and there are 
two channels. 

We next give an approximation algorithm for CHA. Our algorithm uses the 
solution for the the Dense fc-Subgraph problem (DkS) described in Section 1. 
Specifically, we use the approximation algorithm DkS(G, k) due to [10]. 

Algorithm: Guess the optimal load by trying out all possible values. Con- 
sider a guess L. Invoke DkS(G, L), which returns an approximate solution for the 
densest subgraph on L vertices. Place these L vertices returned by DkS(G, L) 
onto a new channel. Remove all the covered edges from G. If any edges remain 
uncovered invoke DkS again. 

It is not hard to show that we get an 0(plg n)-approximate solution here, 
where p is the approximation guarantee of DkS(G, k). Thus we have 

Theorem 4. For a certain constant a < 1/3, there is an 0(n“lnn)- 

approximation algorithm for CHA. 



4 The Edge-Partitioning Problem 

We now present approximation algorithms for EP: a sequential, deterministic 
algorithm in Section 4.1, and a distributed, randomized one in Section 4.2. We 
will throughout use hypergraph covering terminology: given a hypergraph H = 
(P, E) with n vertices and m edges (each having a fixed number w of vertices), 
we wish to partition the edges into sets of at most k edges each, in order to 
minimize the sum of the total number of vertices in each set (“each set” here 
means “each block of the partition” ) . 

4.1 A Deterministic Algorithm 

We now present a deterministic 0{w ■ n™+i )-approximation algorithm; see The- 
orem 5. Recall that the degree of a vertex in a hypergraph is the number of 
edges incident to it (i.e., contain it). Let H = (V,E) be the given hypergraph. 
We start by considering the following greedy algorithm: 

Edge PARXiTiON(iJ = {V,E),k) 

1 F ^ 0 

2 While |F| > fc do 

3 Remove the isolated vertices from V 
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4 

5 

6 

7 

8 

9 

10 
11 
12 

13 

14 

15 

16 
17 



H' = {V',E') ^H={V,E) 

While |^;'| > fc do 

u <— a lowest degree vertex in E[' 

L <— {edges in E' that are incident to u} 

V ^V'\ |u} 

E' ^E'\L 

End 

E'\JL 

Arbitrarily remove some edges from R to make \R\ = k 

(i.e., R is the set of edges assigned to a new channel) 

H ^ H\R 

End 



F^F[j{E} 



Lemma 4. For each iteration of the outer while loop, the number of vertices 

in R is at most w (^) " n' + 1 ~ re (^) ” n', where n' = \V'\, m' = \E'\ for 
the H' = (y' ,E') being used in that iteration. 



Lemma 5. The total number of vertices in the edge partition is at most 

wn { m \ 

{1—1/w) \ k ) 



Lemma 6. The optimal solution has at least max{n, to} > • 

vertices. 



Lemma 7. From Lemmas 5 and 6, the approximation ratio of our algorithm is 
at most ^ ™ 



W— 1 



f en \ ^ 



Note that in the case of graphs, i.e., w = 2, the approximation ratio of our 
algorithm is at most 4y^^^. Also note that the constant factor of this ratio can 

be improved in the analysis for w = 2. The algorithm of [14] works for w = 2, 
and their approximation ratio for w = 2 is about 



Lemma 8. By partitioning E into m parts such that each part consists of exactly 
one edge, we obtain a trivial algorithm whose approximation ratio is at most 



Theorem 5. By running the first algorithm and the trivial algorithm and taking 
the best solution, we obtain an algorithm with approximation ratio at most 2w ■ 
n“+i . The running time of the composite algorithm is 0(^(m + n)). 
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4.2 A Distributed Algorithm for the Graph Case 

We now present a randomized distributed )-approximation algorithm for 

the case where the given hypergraph iJ is a graph G = {V,E). Recall that 
in the present case where w = 2, each user basically requests two topics. We 
consider a fully distributed model where each broadcast channel has a server 
running it, and where each topic also has its own distribution server. A topic- 
distribution server can communicate with a channel server, if the former wants 
its topic broadcast on that channel. Each arriving user communicates only with 
the two topic-distribution servers of interest to it; thus, the model is distributed 
in the sense that the users need not have any knowledge about each other. By 
interpreting the topics as vertices and as the two topics of interest to a user 
as an edge, we thus equivalently get the following familiar distributed point- 
to-point model. Each vertex in the graph G = (V, E) has a processor which 
can communicate with its neighbors, as well as with the servers handling the 
channels. Each processor knows the values of n (which is a static parameter - 
the number of topics) and k. We now wish to assign each edge to one channel 
(from among an arbitrary number of channels), such that each channel has at 
most k edges assigned to it. (The two processors at the end-point of an edge 
co-operatively decide which channel that edge gets assigned to.) The goal is to 
minimize the sum, over all channels i, of the total number of vertices that use 
i (a vertex v uses i iff some edge incident to v is assigned to i). Computation 
proceeds in rounds: in each round, every node communicates with its neighbors, 
and updates its internal state. The running time of an algorithm is the number 
of rounds, and hence locality is the main constraint in this model; we aim for 
polylogarithmic-time algorithms . 

We further distinguish two models: strong and weak. In the weak model, if a 
channel has more than k edges that attempt to get assigned to it, the channel 
sends back a “No” message to the end-points of these edges, after which the 
end-points can retry. In the strong model, even such attempts are disallowed, 
and if we ever attempt to send more than k edges to a channel, the system 
enters a “Failed” state. Such a strongly constrained model is less realistic than 
the weak model - in practice, a channel can typically report that it is getting 
overloaded, without crashing. However, we also study the strong model and show 
that if all nodes know the value of m (which can be obtained if each incoming 
user “registers” with a central server which broadcasts the value of m to all 
servers), then we can develop an 0(n^/^)-approximation algorithm even for the 
strong model. (There is a positive probability of entering the Failed state in 
our algorithm for the strong model - indeed, this seems inevitable - but this 
probability can be made as small as n~‘^ for any desired constant c.) In the weak 
model, the processors need not know the value of m. 

The algorithm. We first assume the strong model, where the value of m is 
known to all nodes; we will finally show how to translate our results to the 
weak model. As in Section 4.1, there is the “trivial algorithm” (which places at 
most k edges arbitrarily on each channel) whose total objective function value 
is at most 2m. The trivial algorithm can be easily implemented in the strong 
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model with a contention-resolution type algorithm, where each edge chooses to 
be assigned to each channel independently with a suitable probability p. Briefly, 
if fc > log^n, we take, say, 4(m/fc)logn channels and p = fc/(2m); each edge 
tries each channel with probability p, and goes to the first one on which its trial 
came up 1. If fc < log^ n, we just take p = and take (4/p) log n channels, 
for a suitable constant c. It is easy to show that with high probability, we get 
a feasible solution with the desired objective function value of 0{m). Like in 
Section 4.1, our focus is on showing how to construct a feasible solution with 
objective function value 0(ri\J m/k)] taking the better of this solution and that 
of the trivial algorithm, will yield an )-approximation. For the rest of this 

discussion, we assume k > log^ n, say; if k is smaller, the above trivial algorithm 
already results in a poylog(n) approximation. 

The heart of our algorithm is the following: a preprocessing step followed by 
a random-selection step. Define d = [^], and let deg{v) be the current degree 
of V. The preprocessing step is as follows; it basically ensures that the maximum 
degree is not much more than the average degree. Each v £ V makes 
virtual copies of itself; it then distributes its deg{v) incident edges to these copies, 
so that no copy gets more than d edges. Thus we get a new graph with m edges, 
and maximum degree d. It is easy to see that the new number of vertices is at 
most 2n. So, we have a graph with number of vertices in the range [n, 2n], which 
has m edges and maximum degree at most 2m /n. Now, the random-selection 
step is as follows. Choose amjk new channels, where a is a suitable constant. 
Each vertex then independently goes into each of these channels with probability 
p = \Jkj(2m'). (More precisely, the choices for all virtual copies of an original 
vertex v, are all made independently by v.) An edge is assigned to a channel iff 
both of its end-points choose to go into that channel; if an edge gets assigned to 
more than one channel, it chooses one arbitrarily. 

The above preprocessing and random-selection constitute the main iteration 
of the algorithm. Note that the expected number of edges on any channel is fc/2, 
and that for any edge, the probability that it was assigned to at least one of 
the channels is 1 — (1 — fc/(2m))“™/^ « I — The expected total load on 

the channels is a{m/k) ■ np = an\Jmj(2k'). If everything happens according to 
expectation, we would have covered a constant fraction 6 ~ I — of the 

edges, at a total cost of 0{n^/rn/k). We can then try to iterate this argument 
on the residual graph, leading to a toal cost of 

6>(y^n\/m(l — by/k) = 0{n^Jm/k)\ (3) 

i>0 

furthermore, the running time is basically the number of iterations, which would 
be 0(log m) = 0(log n) with high probability. 

The above idea on total cost can be carried through by using the Chernoff- 
Hoeffding bounds [5,16]. However, bounding the number of edges assigned to 
a channel is harder, due to correlations; moreover, the correlation among the 
edges is in the “wrong” direction, as far as proving a concentration of measure 
is concerned. This is where our preprocessing step helps; intuitively, since it 
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eliminates high-degree vertices, the correlation among the edges is lessened. First 
of all, to lower-bound the number of edges assigned to any channel, we use 
Janson’s inequality for lower-tail bounds [17]. Fix a particular channel. Let = 
1 be the event that edge e is assigned to that channel, and = 0 otherwise. 
Define e ~ / iff the edges e and / are different, and have a common end- 
point. Then, A = Pr [Xg = Xf = 1] can be bounded by 0{n{dYp^) = 

this is because there are 0{n{d)^) pairs (e,/) such that e ~ /, and 
because Pr [Xg = Xf = 1] = for any such pair. Thus, by Janson’s inequality, 
the probability that at most fc/4 edges get assigned to that channel is at most 
^ ig Since we have assumed that k > log^n, this 

is negligibly small. Next, in order to follow the constraint of the strong model, 

we also need to show that at most k edges get assigned to the channel. Such 
upper-tail bounds are usually harder, but the recent tail bounds of [18,22] can 
be shown to help; they help show that the probability of more than k edges 
getting assigned to a channel is once again at most (The fact that our 

preprocessing significantly reduces the maximum degree, once again plays a key 
role.) 

The above brief sketch shows that “everything relevant happens nearly ac- 
cording to expectation” , with high probability. The nodes no longer know the 
exact value of m after one or more iterations, but choose an estimate slightly 
larger than expectation, and repeat. We can now use the argument following (3) 
to claim our performance bounds, and this concludes our brief discussion of the 
main ideas. Finally, for the weak model, we do not know the value of m, but 
guess it by repeated doubling. More precisely, we first run the above protocol for 
the strong model assuming m = 2; for each surviving edge, its end-points then 
run the above protocol for m = 4, and so on. When we finally hit the correct 
value of m, we will terminate with high probability. Since the cost function in 
(3) is proportional to our final cost now is just a constant times that of (3) 
with high probability; the running time remains poly logarithmic. 
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Abstract. This paper explores three concepts: the fc-center problem, 
some of its variants, and asymmetry. The fc-center problem is a funda- 
mental clustering problem, similar to the fc-median problem. Variants of 
fc-center may more accurately model real-life problems than the original 
formulation. Asymmetry is a significant impediment to approximation in 
many graph problems, such as fc-center, facility location, fc-median and 
the TSP. 

We demonstrate an 0(log* n)-approximation algorithm for the asym- 
metric weighted fc-center problem. Here, the vertices have weights and 
we are given a total budget for opening centers. In the p-neighbor vari- 
ant each vertex must have p (unweighted) centers nearby: we give an 
0(log* fc)-bicriteria algorithm using 2fc centers, for small p. 

Finally, the following three versions of the asymmetric fc-center problem 
we show to be inapproximable: priority fc-center, k-supplier, and outliers 
with forbidden centers. 



1 Introduction 

Imagine you have a delivery service. You want to place your delivery hubs at 
locations that minimize the maximum distance between customers and their 
nearest hubs. This is the k-center problem — a type of clustering problem that is 
similar to the facility location and fc-median problems. The motivation for the 
asymmetric fc-center problem, in our example, is that traffic patterns or one-way 
streets might cause the travel time from one point to another to differ depending 
on the direction of travel. Traditionally, the fc-center problem was solved in the 
context of a metric; in this paper we retain the triangle inequality, but abandon 
the symmetry. 

Symmetry is a vital concept in graph approximation algorithms. Very re- 
cently, the fc-center problem was shown to be l7(log* n) hard to approximate [6, 
7], even though the symmetric version has a factor 2 approximation. More- 
over, facility location and fc-median both have constant factor algorithms in the 
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symmetric case, but are provably l7(log n) hard to approximate without sym- 
metry [1]. The traveling salesman problem is a little better, in that no ^2{logn) 
hardness is known, but without symmetry no algorithm better than 0(log n) 
has been found either. 

Definition 1 (fc-Center). Given G = (V,E), a complete graph with nonnega- 
tive (but possibly infinite) edge costs and a positive integer k, find a set S of k 
vertices, called centers, with minimum covering radius. The covering radius of a 
set S is the minimum distance R such that every vertex in V is within distance 
R of some vertex in S. 

Kariv and Hakimi [11] showed that the fc-center problem is A^P-hard. With- 
out the triangle inequality the problem is NP-haxd to approximate; we hence- 
forth assume that the edge costs satisfy the triangle inequality. 

The asymmetric fc-center problem has proven to be much more difficult to 
understand than its symmetric counterpart. Hsu and Nemhauser [10] showed 
that the fe-center problem cannot be approximated within a factor of (2 — e) 
unless P = NP. In 1985 Hochbaum and Shmoys [8] provided a (best possible) 
factor 2 algorithm for the symmetric fc-center problem. In 1996 Panigrahy and 
Vishwanathan [16, 13] gave the first approximation algorithm for the asymmetric 
problem, with factor 0(log*n). Archer [2] proposed two 0(log* fc) algorithms 
based on many of the ideas in [13]. We now know [6, 7] that these algorithms 
are asympotitcally the best possible. 



Variants of the fe- Center Problem A number of variants of the fc-center 
problem have been explored in the context of symmetric graphs. Perhaps some 
delivery hubs are more expensive to establish than others: instead of a restriction 
on the number of centers we can use, each vertex has a weight and we have a 
budget W, that limits the total weight of centers. Hochbaum and Shmoys [9] pro- 
duced a factor 3 algorithm for this weighted k-center problem. This has recently 
been shown to be tight [6]. 

Hochbaum and Shmoys [9] also studied the k-supplier problem where the 
vertex set is segregated into suppliers and customers. Only supplier vertices 
can be centers and only the customer vertices need to be covered. Hochbaum 
and Shmoys gave a 3-approximation algorithm and showed that this is the best 
possible. 

Khuller et al. [12] investigated the p-neighbor k-center problem where each 
vertex must have p centers nearby. This problem is motivated by need to account 
for facility failures: even if up to p — 1 facilities fail, every demand point has a 
functioning facility nearby. They gave a 3-approximation algorithm for all p, 
and a best possible 2-approximation algorithm when p < 4, noting that the case 
where p is small is “perhaps the practically interesting case” . 

Perhaps some demand points are more important than others. Plesnik [14] 
studied the priority k-center problem, in which the effective distance to a demand 
point is increased in proportion to its specified priority. Plesnik approximates 
the symmetric version within a factor of 2. 
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Charikar et al. [4] note that a disadvantage of the standard /c-center for- 
mulation is that a few distant clients, outliers, can force centers to be located 
in isolated places. They suggest a variant of the problem, the fc-center problem 
with outliers and forbidden centers, where a small subset of clients can be denied 
service, and some points are forbidden from being centers. Charikar et al. gave 
a (best possible) 3-approximation algorithm for the symmetric version of this 
problem. 

Bhatia et al. [3] considered a network model, such as a city street network, 
in which the traversal time change as the day progresses. This is known as the 
k- center problem with dynamic distances: we wish to assign the centers such that 
the objective criteria are met at all times. 



Results and Organization 

Table 1 gives an overview of the best known results for the various fc-center 
problems. In this paper we explore asymmetric variants that are not yet in the 
literature. 



Table 1. An overview of the approximation results for fc-center variants, f/3 
is the maximum ratio of an edge’s greatest length to its shortest length. |This 
is a bicriteria algorithm using fc(l -|- ?>/{v + 1)) centers. §For p < A. ^This is a 
bicriteria algorithm using 2k centers, for p < n/k 



Problem 


Symmetric 


Asymmetric 


fc-center 


2 


[8] 


0(log‘ fc) 


[2] 


fc-center with dynamic distances 


l+/?t 


[3] 


0(log* n + i/) 1 


[3] 


weighted fc-center 


3 


[9] 


0(log*n) 


Here 


p-neighbor fc-center 


3(2 §) 


[5] 


0(log*k) 1 


Here 


priority fc-center 


2 


[14] 


Inapproximable 


Here 


fc-center with outliers and 
forbidden centers 


3 


[4] 


Inapproximable 


Here 


fc-suppliers 


3 


[9] 


Inapproximable 


Here 



Section 2 contains the definitions and notation required to develop the re- 
sults. In Section 3 we briefly review the algorithms of Panigrahy and Vish- 
wanathan [13], and Archer [2]. The techniques used in the standard fc-center 
problem are often applicable to the variants. 

Our first result, in Section 4, is an 0(log* n)-approximation for the asym- 
metric weighted fc-center problem. In Section 5 we develop an 0(log* k) approx- 
imation for the asymmetric p-neighbor fc-center problem, for p < n/k. As noted 
by Khuller et al. [12], the case where p is small is the most interesting case in 
practice. This a bicriteria algorithm, allowing an increase to 2k centers, but it 
can be turned into an 0(log fc)-approximation algorithm using only k centers. 
Turning to hardness, we show that the asymmetric versions of the fc-center prob- 
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lem with outliers (and forbidden centers), the priority fc-center problem, and the 
fe-supplier problem are A^P-hard to approximate (Section 6). 

2 Definitions 

The input to the asymmetric fc-center problem is a distance function d on ordered 
pairs of vertices — distances are allowed to be infinite — and a bound k on the 
number of centers. 

Definition 2. Vertex c covers vertex v within r, or c r-covers v, if dev < f. We 
extend this definition to a set C and a set A if for every a G A there is a c G C 
such that c covers a within r. Often we abbreviate “1-covers” to “covers”. 

Most of the algorithms do not in fact operate on graphs with edge costs. Rather, 
they consider restricted graphs, in which only those edges with distance lower 
than some threshold are included, and the edges have unit cost. Hochbaum and 
Shmoys [9] refer to these as bottleneck graphs. Since the optimal value of the 
covering radius must be one of the distance values, many algorithms essen- 
tially run through a sequence of restricted graphs of every possible threshold 
radius in ascending order. This can be thought of as guessing the optimal ra- 
dius Rqpj. The approach works because the algorithm either returns a solution, 
within the specified factor of the current threshold radius, or it fails, in which 
case i?0PT must be greater than the current radius. 

Definition 3 (Restricted Graph Gr)- For r > 0, define the restricted graph 
Gr of the graph G = (V,E) to be the graph Gr = (V,Er), where Er = {(*, j) : 
dij < r} and all edges have unit cost. 

Most of the following definitions apply to restricted graphs. 

Definition 4 (Power of Graphs). The power of a graph G = (V,E) is the 
graph G* = fV,E*), t > 1, where E* is the set of edges between distinct vertices 
that have a path of at most t edges between them in G. 

Definition 5. For i e N define F^{v) = {u G G \ {v,u) G E'-} U {u}, and 
= {u G G \ (u,v) G P*} U {u} , i.e. in the restricted graph there is a path 
of length at most i from v to u, respectively u to v. 

Notice that in a symmetric graph Fi^{v) = F~{v). We extend this notation to 
sets so that F^{S) = {u G G \ u G F^{v) for some v G S} , with F~{S) defined 
similarly. We use F^{v) and F~{v) instead of Pi^(u) and F^{v). 

Definition 6. Fori G N define T^{v) = F^{v)\F^^{v) , andT~{y) = F~ [v)\ 
F~_i(v) , i.s.., the nodes for which the path distance from v is exactly i, and the 
nodes for which the path distance to v is exactly i, respectively. 

For a set S, the extension follows the pattern (S) = (S) \ P^'t i (-S') . We 

use T^(u) and T~{v) instead of ^{’’(u) and T^{v). 
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Definition 7 (Center Capturing Vertex (CCV)). A vertex v is a center 
capturing vertex (CCV) ifr~{v) C r~^{v), i.e., V covers every vertex that covers 

V. 



In the graph the optimum center that covers v must lie in r~(v). For 

a CCV V, it lies in r^{v), hence the name. In symmetric graphs all vertices are 
CCVs and this property leads to the standard 2-approximation. 

Definition 8 (Dominating Set). Civen a graph G = (V,E), and a weight 
function w : V — > on the vertices, find a minimum weight subset D f- V 

such that every vertex v G V is covered by D, i.e., v G F+(D) for all v G V. 

Definition 9 (Set Cover). Given a universe U of n elements, a collection 
S = {S'!, . . . ,Sk} of subsets oflA, and a weight function w : S ^ Q+, find a 
minimum weight sub-collection of S that includes all elements ofU. 

The Max Coverage problem, on an instance (U,S,k), is similar to the Set 
Cover problem: instead of trying to minimize the number of sets used we have a 
bound on the number of sets we can use, and the problem is then to maximize the 
number of elements covered. The Dominating Set, Set Cover, and Max Coverage 
problems are all VP-complete. 

3 Asymmetric fc-Center Review 

The 0(log* n) algorithm of Panigrahy and Vishwanathan [13] has two phases, 
the halve phase, sometimes called the reduce phase, and the augment phase. As 
described above, the algorithm guesses Pqpt, and works in the restricted graph 
Gropt- the halve phase we find a CCV v, include it in the set of centers, mark 
every vertex in P^iv) as covered, and repeat until no CCVs remain unmarked. 
The CCV property ensures that, as each CCV is found, the rest of the graph can 
be covered with one fewer center. Hence if k” CCVs are obtained, the unmarked 
portion of the graph can be covered with k' = k — k” centers. The authors 
then prove that this unmarked portion, CCV-free, can be covered with only fc'/2 
centers if we use radius 5 instead of 1. That is to say, fc'/2 centers suffice in the 
graph 

The fc-center problem in the restricted graph is identical to the dominating 
set problem. This is a special case of set cover in which the sets are the P+ 
terms. In the augment phase, the algorithm recursively uses the greedy set cover 
procedure. Since the optimal cover uses at most k' / 2 centers, the first cover has 
size at most \ log W ■ 

The centers in this first cover are themselves covered, using the greedy 
set cover procedure, then the centers in the second cover, and so forth. Af- 
ter 0(log* n) iterations the algorithm finds a set of at most k' vertices that, 
together with the CCVs, 0(log* n)-covers the unmarked portion, since the op- 
timal solution has fc'/2 centers. Combining these with the k” CCVs, we have k 
centers covering the whole graph. 
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Archer [2] presents two 0(log* k) algorithms, both building on the work 
in [13]. The algorithm more directly connected with the earlier work neverthe- 
less has two fundamental differences. Firstly, in the reduce phase Archer shows 
that the CCV-free portion of the graph can be covered with 2fc'/3 centers and 
radius 3. Secondly, he constructs a set cover-like linear program and solves the 
relaxation to get a total of k' fractional centers that cover the unmarked vertices. 
From these fractional centers, he obtains a 2-cover of the unmarked vertices with 
k' log k' (integral) centers. These are the seed for the augment phase, which thus 
produces a solution with an 0(log* k') approximation to the optimum radius. 

During the preparation of the final version of this manuscript, it was an- 
nounced that the asymmetric fc-center problem is hard to approximate better 
than l7(log* n) [6, 7], closing the gap with the upper bound. 

4 Asymmetric Weighted fc-Center 

Recall the application in which the costs of delivery hubs vary. In this situation, 
rather than having a restriction on the number of centers used, each vertex has 
a weight and we have a budget W that restricts the total weight of centers used. 

Definition 10 (Weighted fc-Center). Given a weight function on the vertiees, 
w \ V —f Q’*', and a bound W € the problem is to find S QV of total weight 
at most W, so that S covers V with minimum radius. 

Hochbaum and Shmoys [9] gave a 3-approximation algorithm for the sym- 
metric weighted version, applying their approach for bottleneck problems. We 
propose an 0(log* n)-approximation for the asymmetric version, based on Pan- 
igrahy and Vishwanathan’s technique for the unweighted problem. Note that in 
light of the hardness result just announced [6, 7], this algorithm is asymptoti- 
cally optimal. Another variant has both the k and the W restrictions, but we 
will not expand on that problem here. 

First a brief sketch of the algorithm, which works with restricted graphs. In 
the reduce phase, having found a CCV, v, we pick the lightest vertex u in F~(v) 
(which might be v itself) as a center in our solution. Then mark everything in 
r^{u) as covered, and continue looking for CCVs. We can show that there exists 
a 7-cover of the unmarked vertices with total weight less than half optimum. 
Finally we recursively apply a greedy procedure for weighted elements 0(log* n) 
times, similar to the one used for Set Cover. The total weight of centers in our 
solution set is at most W. 

The following lemma about digraphs is the key to our reduce phase and is 
analagous to Lemma 4 in [13] and Lemma 16 in [2]. 

Lemma 1 (Cover of Half the Graph’s Weight). Let G = (V,E) be a di- 
graph with weighted vertices, but unit edge easts. Then there is a subset S C V , 
w{S) < w{V)/2, such that every vertex with positive indegree is reachable in at 
most 3 steps from some vertex in S. 



Asymmetry in fc-Center Variants 



65 



Proof. To construct the set S repeat the following, to the extent possible: Select 
a vertex with positive outdegree, but if possible select one with indegree zero. 
Let V be the selected vertex and compare sets {?;} and P^{v) \ {z;}: add the set 
of smaller weight to S and remove P^{v) from G. 

It is clear that the weight of S is no more than half the weight of V. We 
must now show that S 3-covers all non-orphan vertices — we call x a parent of y 

if X e r~{y). 

The children of v are clearly 1-covered. Assume v is not in S (trivial other- 
wise): if V was an orphan initially then ignore it. If w is an orphan when selected, 
then some parent must have been removed by the selection of a grandparent, so 
it is 2-covered. 

So V has at least one parent when it is selected, implying there are no orphan 
vertices at that time. Therefore the sets of parents of x, grandparents of x, 
S 2 , and great-grandparents of x, S'3, are not empty. Although these sets might 
not be pairwise disjoint, if they contained any of x’s children, then v would be 
3-covered. 

After V is removed, there are three possibilities for 82 . (i) Some vertex in S'3 is 
selected, removing part of S2; (ii) Some vertex in S 2 is selected and removed; (iii) 
Some vertex in Si is selected, possibly making some S2 vertices childless. One 
of these events must happen, since Si and S2 are non-empty. As a consequence, 
V is 3-covered. □ 

Henceforth call the vertices that have not yet been covered/marked active. 
Using Lemma 1 we can show that after removing the CCVs from the graph, 
we can cover the active set with half the weight of an optimum cover if we are 
allowed to use distance 7 instead of 1. 

Lemma 2 (Cover of Half Optimal Weight). Consider a subset A C V that 
has a cover consisting of vertices of total weight W , but no CCVs. Assume there 
exists a set Ci that 3-covers exactly U \ A. Then there exists a set of vertices S 
of total weight W/2 that, together with C\, 7 -cover A. 

Proof. Let U be the subset of optimal centers that cover A. We call u G U a 
near center if it can be reached in 4 steps from Ci, and a far center otherwise. 
Since Ci 5-covers all of the nodes covered by near centers, it suffices to choose 
S to 6-cover the far centers, so that S will 7-cover all the nodes they cover. 

Define an auxiliary graph H on the (optimal) centers U as follows. There is 
an edge from x to 1/ in 77 if and only if x 2-covers y in G (and x ^ y). The idea 
is to show that any far center has positive indegree in 77. As a result. Lemma 1 
shows there exists a set S G U with IS"! < W/2 such that S 3-covers the far 
centers in 77, and thus 6-covers them in G. 

Let X be any far center. Since A contains no CCVs, there exists y such that 
y covers x, but x does not cover y. Since x ^ r^{C{), y ^ 7^3*'(C'i), and thus 
y G A (since everything not 3-covered by Ci is in A) . Thus there exists a center 
z G U, which is not x, but might be y, that covers y and therefore 2-covers x. 
Hence x has positive indegree in the graph 77. □ 
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As we foreshadowed, we will use the greedy heuristic to complete the algo- 
rithm. We now analyze the performance of this heuristic in the context of the 
dominating set problem in node- weighted graphs. All vertices V are available as 
potential members of the dominating set (i.e. centers), but we need only dom- 
inate the active vertices A. The heuristic is to select the most efficient vertex: 
the one that maximizes w{A{v)) /w{v) , where A{v) = AO T+(u). 



Lemma 3 (Greedy Algorithm in Weighted Dominating Set). Let G = 

(V, E), w : V Q+ be an instance of the dominating set problem in which a set 
A is to be dominated. Also, let w* be the weight of an optimum solution for this 
instance. The greedy algorithm gives an approximation guarantee of 



2 -I- In 



w{A) 






= O log 



w{A) 






Proof. In every application of the greedy selection there must be some vertex 
V G V for which 

w(A(u)) ^ w{A) ^ w{A{v)) ^ 
w{v) ~ w* tc(A) ~ w* 

otherwise no optimum solution of weight w* would exist. This is certainly true 
of the most efficient vertex v, so make it a center and mark all that it covers, 
leaving A! uncovered. Now, 

w{A') = w{A) — w(A(u)) < w(A) < w(A)exp 




After j steps, the remaining active vertices, A^, satisfy 



w;(A'’) < w(A°)J]^exp , (2) 

where Vi is the ith center picked (greedily) and A° is the original active set. 

Assume that after some number of steps, say j, there are still some active 
elements, but the upper bound in (2) drops below w* . That is to say. 



3 

> ic* ln(w;(A°)/w;*) . 

i=l 

Before we picked the vertex vj we had 

3 

< w*ln{w{A^)/w*) , and so, w{vj) < w* + w* ln(w(A°)/w*) , 

i=l i=l 

because (1) tells us that w{vi) is no greater than w*. To cover the remainder, 
A-1, we just use A^ itself, at a cost of at most w* . Hence the total weight of the 
solution is at most w*{2 + ln(r(;(A°)/r(;*)). 

On the other hand, if the upper bound on w{A^) never drops below w* before 
A-1 becomes empty, then we have a solution of weight at most w* ln(r(;(A°)/r(;*). 

□ 
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We now show that this tradeoff between covering radius and optimal cover 
size leads to an 0(log* n) approximation. 

Lemma 4 (Recursive Set Cover). Given A C V, such that A has a cover of 
weight W , and a set Ci € V that covers \ A, we can find in polynomial time a 
set of vertices of total weight at most 2W that, together with Ci, cover A (and 
hence V) within a radius ofOi\og*n). 

Proof. Our first attempt at a solution, Sq, is all vertices of weight no more than 
W: only these vertices could be in the optimum center set. Their total weight is 
at most nW. Since Ci covers Sq \ A, consider Aq = S'o n A, which has a cover 
of size W. Lemma 3 shows that the greedy algorithm results in a set Si that 
covers Aq, and has weight 

/ Wn\ 

w{Si) < O f Wlog j = OfWlogn) . 

Set Cl covers \ A, so we need only consider = Si n A, and so forth. At the 
ith iteration we have: w{Si) < 0{W \og{w{Si-i) /W)) and hence by induction 
at most 0(Wlog*-*^ n). Thus after log* n iterations the weight of our solution set 
falls to 2W. □ 

All the algorithmic tools can be assembled to form an approximation algorithm. 

Theorem 1 (Approximation of Weighted fc-Center). We can approximate 
the weighted k-center problem within factor 0(log* n) in polynomial time. 

Proof. Guess the optimum radius, Rqpt, and work in the restricted graph G/jop^. 
Initially, the active set A is V. Repeat the following as many times as possible: 
Pick CCV v in A, add the lightest vertex u in P~{v) to our solution set of centers 
and, remove the set (it) from A. Since v is covered by an optimum center in 
r^{v), u is no heavier than this optimum center, and P(l{u) includes everything 
covered by the optimum center. 

Let Cl be the centers chosen in this first phase. We know the remainder of 
the graph. A, has a cover of total weight W' = W-w(Ci), because of our choices 
based on CCV and weight. 

Lemma 2 shows that we can cover the remaining uncovered vertices with 
weight no more than W' (2 if we use distance 7. So let the active set A be 
V \ P(^(Ci), and recursively apply the greedy algorithm as described in the 
proof of Lemma 4 on the graph As a result, we have a set of size W that 

covers A within radius 0(log* n). □ 

5 Asymmetric p-Neighbor fc-Center 

Imagine that we wish to locate k facilities at such that the maximum distance 
of a demand point from its p*^-closest facility is minimized. As a consequence, 
failures in p — 1 facilities do not bring down the network. 
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Definition 11 (Asymmetric p-Neighbor fc-Center Problem). Let dp{S,v) 
denote the distance from the closest vertex in S to v. The problem is to find 
a subset S of at most k vertices that minimizes 

max dJS, t;) . 
vdV\S 

We show that we can approximate the asymmetric p-neighbor fc-center prob- 
lem within a factor of 0(log* k) if we allow ourselves to use 2k centers. Our 
algorithm is restricted to the case p < n/k, but this is reasonable as p should 
not be too large [12]. 

We use the same techniques as usual, including restricted graphs, but in the 
augment phase we use the greedy algorithm for the Constrained Set Multicover 
problem [15]. That is, each element, e, needs to be covered r^ times, but each 
set can be picked at most once. The p-neighbor fc-center problem has re = p 
for all e. We say that an element e is alive if it occurs in fewer than p sets 
chosen so far. The greedy heuristic is to pick the set that covers the most live 
elements. It can be shown that this algorithm achieves an approximation factor 
of Hn = O(logn) [15]. However the following result is more appropriate to our 
application. 

Lemma 5 (Greedy Constrained Set Multicover). Let k be the optimum 
solution to the Constrained Set Multicover problem. The greedy algorithm gives 
approximation guarantee 0(log(np/k)). 

Proof. The same kind of averaging argument used for standard Set Cover shows 
that the greedy choice of a set reduces the total number of unmarked element 
copies by a factor 1 — 1/k. So after i steps the number of copies of elements 
yet to be covered is np(l — 1/fc)* < np{e~^^^Y . Hence after kln{np/k) steps the 
number of uncovered copies of elements is at most k. A naive cover of these last 
k element copies leads to the total number of sets being k + kln{np/k). □ 

If p < n/k this greedy algorithm gives an approximation factor of 0(log(n/fe)). 
Applying the standard recursive approach in [13], which works in the p-neighbor 
case, we can achieve an 0(log n) approximation with k centers, or 0(log* n) with 
2k centers. We can lower the approximation guarantee to 0(log* k), with 2k cen- 
ters, using Archer’s LP-based priming. First solve the LP for the constrained set 
multicover problem. In the solution each vertex is covered by an amount p of 
fractional centers, out of a total of k. We can now use the greedy set cover al- 
gorithm to get an initial set of k^ In k centers that 2-covers every vertex in the 
active set with at least p centers. Repeatedly applying the greedy procedure for 
constrained set multicover, this time for (log* fe-l- 1) iterations, we get 2k centers 
that cover all active vertices within 0(log* k). Alternatively, we could carry out 
O(logfc) iterations and stick to just k centers. 

6 Inapproximability Results 

In this section we give inapproximability results for the asymmetric versions 
of the fc-center problem with outliers, the priority fc-center problem, and the 
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/c-supplier problem. These problems all admit constant factor approximation 
algorithms in the symmetric case. 

Asymmetric fe-Center with Outliers 

Definition 12 (fc-Center with Outliers and Forbidden Centers). Find a 
set sec, where C is the set of vertices allowed to be centers, such that [S'! < fc 
and S covers at least p nodes, with minimum radius. 

Theorem 2. For any polynomial time computable function a(n), the asymmet- 
ric k-center problem with outliers (and forbidden centers) cannot be approxi- 
mated within a factor of a{n), unless P = NP. 

Proof. We reduce instance {U, S, k) of Max Coverage to our problem. Construct 
vertex sets A and B so that for each set S € S there is vs G A, and for each 
element e G U there is Ve G B. From every vertex vs G A, create an edge of unit 
length to vertex Ve G B if e G S. 

Let p = \B\ + fc, so that if we find k centers that cover p vertices within any 
finite distance, we must have found k vertices in A that cover all |i?| vertices. 
Hence we have solved the instance of Max Coverage which is an iVP-complete 
problem. □ 

Note that the proof never relied on the fact that the B vertices were forbidden 
from being centers (setting p to \B\-\- k ensured this). 

Asymmetric Priority fe- Center 

Definition 13 (Priority fc-Center). Given a priority function p : V ^ Q+ 
on the vertices, find S <G V , l^l < k, that minimizes R so that for every v G V 
there exists a center c G S for which pydev < R. 

Theorem 3. For any polynomial time computable function a(n), the asymmet- 
ric k-center problem with priorities cannot be approximated within a factor of 
a(n), unless P = NP. 

Proof. The construction of the sets A and B is the similar to the proof of The- 
orem 2, except that we reduce from Set Cover. This time make the set A a 
complete digraph, with edges of length £, as well as the unit length set-element 
edges from A to B. Give the nodes in set A priority 1 and the nodes in set B 
priority £. An optimal solution to the priority fc-center problem is k centers in 
A and a radius of i, which covers every vertex. This implies that the k centers 
cover (in the Set Cover sense) all the elements in B. If fc' < fc centers were chosen 
from A and k—k' centers were chosen from B instead, we could trivially convert 
this to a solution choosing k centers from A. 

Any non-optimal solution requires a radius of at least -I- I, as this would 
involve covering some B vertex by stepping from an A center through another 
A vertex. Therefore any algorithm with approximation guarantee -I- 1 — £ or 
better would solve Set Cover. We can make £ any function we like and the result 
follows. □ 
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Asymmetric fe- Supplier 

Definition 14 (fc-Supplier). Given a set of suppliers E and a set of customers 
C , find a subset S E that minimizes R such that S covers C within R. 

Theorem 4. For any polynomial time computable function a{n), the asymmet- 
ric k-supplier problem cannot be approximated within a factor of a{n), unless 
P = NP. 

Proof. By a reduction from the Max Coverage problem similar to the proof of 
Theorem 2. □ 
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Abstract. Given a network with capacities and transit times on the arcs, the 
quickest flow problem asks for a ‘flow over time’ that satisfies given demands 
within minimal time. In the setting of flows over time, flow on arcs may vary over 
time and the transit time of an arc is the time it takes for flow to travel through 
this arc. In most real-world applications (such as, e.g., road traffic, communica- 
tion networks, production systems, etc.), transit times are not fixed but depend 
on the current flow situation in the network. We consider the model where the 
transit time of an arc is given as a nondecreasing function of the rate of inflow 
into the arc. We prove that the quickest s-f-flow problem is NP-hard in this set- 
ting and give various approximation results, including an FPTAS for the quickest 
multicommodity flow problem with bounded cost. 



1 Introduction 

Flows over time have been introduced more than forty years ago by Ford and Fulker- 
son [6, 7]. Given a directed graph with capacities and transit times on the arcs, a source 
node s, a sink node t, and a time horizon T, they consider the problem of sending the 
maximum possible amount of flow from s to f within T time units. A flow over time 
specifies a flow rate for each arc at each point in time. The capacity of an arc is an upper 
bound on this flow rate, i.e., on the amount of flow that can be sent into the arc during 
each unit of time. Flow on an arc progresses at a constant speed which is determined by 
its transit time. 

Known results for flows over time with constant transit times. Ford and Fulkerson show 
that the maximum s-f-flow over time problem can be solved by essentially one static 
min-cost flow computation in the given network, where transit times are interpreted as 
costs. An arbitrary path decomposition of such a static min-cost flow can be turned into 
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a flow over time by sending flow at the given flow rate into each path as long as there 
is enough time left for the flow on a path to arrive at the sink before time T. A flow 
featuring this structure is called ‘temporally repeated’. 

A problem closely related to the maximum s-f-flow over time problem is the quick- 
est s-f-flow problem. Here, the flow value (or ‘demand’) is fixed and the task is to find 
a flow over time with minimal time horizon T. Clearly, this problem can be solved 
in polynomial time by incorporating the algorithm of Ford and Fulkerson into a binary 
search framework. Burkard, Dlaska, and Klinz [2] give a strongly polynomial algorithm 
for the quickest s-f-flow problem which is based on the parametric search method of 
Megiddo [15]. Hoppe and Tardos [10, 11] study the quickest transshipment problem 
which, given supplies and demands at the nodes, asks for a flow over time that zeroes 
all supplies and demands within minimal time. They give a polynomial time algorithm 
which is, however, based on a submodular function minimization routine. 

The latter fact already indicates that flow over time problems are. In general, con- 
siderably harder than their static counterparts in classical network flow theory. The best 
evidence for this allegation is maybe provided by a surprising result of Klinz and Woeg- 
inger [12]. They show that computing a quickest s-f-flow of minimum cost in a network 
with cost coefficients on the arcs is already NP-hard in series-parallel networks. More- 
over, it is even strongly NP-hard to find a quickest temporally repeated s-f-flow of 
minimum cost. Only recently. Hall, Hippier, and Skutella [8] showed that computing 
quickest multicommodity flows is NP-hard, even on series-parallel networks. 

On the other hand. Ford and Fulkerson [6, 7] introduce the concept of time-expand- 
ed networks which allows to solve many flow over time problems in pseudopolynomial 
time. The node set of a time-expanded network consists of several copies of the node 
set of the underlying graph building a ‘time layer’ . The number of time layers is equal 
to the integral time horizon T and thus pseudopolynomial in the input size. Copies of an 
arc of the underlying graph join copies of its end-nodes in time layers whose distances 
equal the transit time of that arc. Ford and Fulkerson observe that a flow over time in the 
given graph corresponds to a static flow in the time-expanded network, and vice versa. 
Thus, many flow over time problems can be solved by static flow computations in the 
time-expanded network. 

Fleischer and Skutella [4] come up with so-called ‘condensed’ time-expanded net- 
works which are of polynomial size and can be used to compute provably good multi- 
commodity flows over time with costs in polynomial time. In particular, they present a 
fully polynomial time approximation scheme (FPTAS) for the quickest multicommod- 
ity flow problem with bounded cost [4, 5]. Using completely different techniques, they 
also show that 2-approximate temporally repeated flows can be obtained from a static, 
length-bounded flow computation in the given graph [4]. The advantage of the latter 
solutions is that they have a very simple structure and also do not use storage of flow at 
intermediate nodes. 

Flow-dependent transit times. So far we have considered the setting of flows over time 
where transit times of arcs are fixed. In many practical applications, however, the latter 
assumption is not realistic since transit times vary with the flow situation on an arc. We 
refer to [1, 16, 17] for an overview and further references. Usually, the correlation of the 
transit time and the flow situation on an arc is highly complex. It is a major challenge to 
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come up with a mathematical model that, on the one hand, captures the real behavior as 
realistically as possible and, on the other hand, can be solved efficiently even on large 
networks. 

Kohler and Skutella [14] consider a model where, at any moment in time, the actual 
speed of flow on an arc depends on the current amount of flow on the arc. Under this 
assumption, they give a 2-approximation algorithm for the quickest s-f-flow problem 
and show that no polynomial time approximation scheme (PTAS) exists, unless P=NP. 
A simpler model is studied by Carey and Subrahmanian [3]. They assume that the transit 
time on an arc only depends on the current rate of inflow into the arc and propose a time- 
expanded network whose arcs somehow to reflect this behavior. Kohler, Langkau, and 
Skutella [13] give a 2 -approximation algorithm for the quickest s-f-flow problem in the 
setting of inflow-dependent transit times. The algorithm uses the algorithm of Ford and 
Fulkerson [6, 7] on a so-called ‘bow graph’ with fixed transit times on the arcs. In the 
bow graph, every arc of the original graph is replaced by a bunch of arcs corresponding 
to different transit times. The quickest flow problem in the bow graph is a relaxation of 
the quickest flow problem with inflow-dependent transit times. 

Contribution of this paper. While, for the special case of constant transit times, quickest 
s-f-flows can be computed in polynomial time [2, 6, 7], we show in Section 6 that the 
problem becomes NP-hard if we allow inflow-dependent transit times. In Section 4, we 
generalize the 2-approximation result given in [ 1 3] to the setting with costs and multiple 
commodities. Our approach is based on a new and stronger relaxation of the quickest 
flow problem, which we introduce in Section 3. This relaxation is defined in a bow 
graph similar to the one introduced in [13], but it uses additional ‘coupling constraints’ 
between flow values on different copies of one arc in the original graph. In particular, 
this relaxation can no longer be solved by standard network flow algorithms but re- 
quires general linear programming techniques. Nevertheless, as shown in Section 4, the 
approximation technique based on length-bounded static flows presented in [4] can be 
generalized to yield provably good solutions to our bow graph relaxation. Moreover, we 
prove that such a solution to the relaxation can be turned into a feasible multicommodity 
flow over time with inflow-dependent transit times and bounded cost. 

The main result of this paper is a fully polynomial time approximation scheme for 
the quickest multicommodity flow problem with bounded cost and inflow-dependent 
transit times (see Section 5). It again uses the new bow graph relaxation introduced in 
Section 3 and generalizes the approach based on condensed time-expanded networks 
from [5]. Interestingly, the time-expanded version of our bow graph relaxation essen- 
tially coincides with the modified time-expanded graph considered by Carey and Sub- 
rahmanian [3]. 

Due to space limitations, we omit most proofs in this extended abstract. 



2 Preliminaries 

We are considering network flow problems in a directed graph G = (U, E) with n := 
\V\ nodes and m := \E\ arcs. Each arc e G E has associated with it a positive ca- 
pacity Ue and a nonnegative, nondecreasing transit time function Te : [0,tie] ^ M"*". 
There is a set of commodities K = {1, . . . , fc}; every commodity i G K is defined by 
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a source-sink pair‘d {si,ti) G V x V. The objective is to send a prespecified amount 
of flow di > 0, called the demand, from Si to ti. Finally, each arc e has associated 
cost coefficients Ce,i, for i G K, where Ce,i is interpreted as the cost (per flow unit) for 
sending flow of commodity i through the arc. For an arc e = (t;, w) G E, we use the 
notation head(e) := w and tail(e) := v. 

Flows over time with constant transit times. A (multicommodity) flow over time f in 
G with time horizon T is given by Lebesgue-measurable functions fe,i ■ [0, T) ^ K+, 
where fe,i{Q) is the rate of flow (per time unit) of commodity i entering arc e at time 
0. In order to simplify notation, we sometimes use fe,i{d) for 6 ^ [Oj?’); implicitly 
assuming that fe.iifl) = 0 in this case. The capacity is an upper bound on the rate of 
flow entering arc e at any moment of time, i.e., fe{0) < Ue for all 9 G [0, T) and e G E. 
Here, fflO) := is the total rate at which flow is entering arc e at time 0. 

In the original setting of flows over time, the transit time function of arc e is 
assumed to be constant. Then, the flow fe,i{9) of commodity i entering arc e at time 6 
arrives at head(e) at time 0+Te. All arcs must be empty from time T on, i.e., /e,i(0) = 0 
for 9 > T — Tf,. To generalize the notion of flow conservation, we define := 

J^ees-(v) It Te) to be the total inflow of commodityi G AT into node n until 

time ^ e [0,T]. Similarly, := X)ee 5 +(j;) /o^ /e.i(^') is the corresponding 

outflow. We consider the model with storage of flow at intermediate nodes. That is, 
flow entering a node can be held back for some time before it is sent onward. To rule 
out deficit at any node, we require > 0, for all ^ G [0,T),i G K, 

and V G y\{si}. Moreover, flow must not remain in any node other than the sinks at 
time T. Therefore, we require that equality holds for every i G K, v G ti}, at 

time ^ = T. The flow over time / satisfies the multicommodity demands if D}}, flT) — 

j(T) = di, for any commodity i G K. The cost of a flow over time / is defined as 
c(/) := G,i Jq fe,i{S)d9. 

Time-expanded graphs. Many flow over time problems can be solved by static flow 
algorithms in time-expanded graphs [6, 7]. Given a graph G = {V,E) with integral 
transit times on the arcs and an integral time horizon T, the T -time-expanded graph of 
G, denoted G^, is obtained by creating T copies of V, labeled Vq through Vr-i, with 
the 9*'^ copy of node v denoted v{9), 9 = 0, . . . ,T — 1. For every arc e = (v,w) G E 
and 0 = 0, . . . , T — 1 — Te, there is an arc e{9) from v{9) to w{9 -\- Te) with the same 
capacity and costs as arc e. In addition, there is an infinite capacity holdover arc from 
v{9) to v{9 -\- 1), for &\\v gV and 9 = 0, . . . ,T — 2, which models the possibility to 
hold flow at node v during the time interval [9,9 1). 

Any static flow in this time-expanded network corresponds to a flow over time of 
equal cost: interpret the flow on arc e{9) as the flow through arc e = {v, w) that starts at 
node V in the time interval [9,9 -\-E). Similarly, any flow over time completing by time 
T corresponds to a static flow in G^ of the same value and cost obtained by mapping 
the total flow starting on e in time interval [9 , 0 -f 1) to flow on arc e(0). Thus, we may 

To simplify notation, we restrict to the case of only one source and one sink for each commod- 
ity. However, our results can be directly generalized to the case of several sources and sinks 
with given supplies and demands for each commodity. 



An FPTAS for Quickest Multicommodity Flows with Inflow-Dependent Transit Times 



75 



solve a flow over time problem by solving the corresponding static flow problem in the 
time-expanded network. 

One drawback of this approach is that the size of depends linearly on T, so that 
if T is not bounded by a polynomial in the input size, this is not a polynomial-time 
method. However, the following useful observation can be found in [4] : If all transit 
times are multiples of some large number Z\ > 0, then instead of using the T-time- 
expanded graph, we may rescale time and use a A-condensed time-expanded graph 
that contains only [T/Z\] copies of V. Since in this setting every arc corresponds to 
a time interval of length A, capacities are multiplied by A. For more details we refer 
to [4]. 

Flows with inflow-dependent transit times. In the original setting of flows over time 
discussed above, it is assumed that transit times are fixed throughout, so that flow on 
arc e progresses at a uniform speed. In the following, we will consider the more general 
model of inflow-dependent transit times. Here, the transit time of an arc may vary with 
the current amount of flow using this arc. Each arc e has an associated non-negative 
transit time function Te, which determines the time it takes for flow to traverse arc e. 
Flow of commodity i entering arc e at time 9 at rate fe,i{0) arrives at head(e) at time 
9 -F Te{fe{8))- We will later need the following simple observation which follows from 
the fact that flow can be stored at intermediate nodes. 

Observation 1. For every arc e G E, let Te : [0, Mg] — *■ and r' : [0, Ug\ K'*' be 
transit time functions on arc e such that t'^{x) < Te{x) for all a: S [0, Ue\. Then, a flow 
over time with inflow-dependent transit times {Te)eeE and time horizon T also yields a 
flow over time with inflow-dependent transit times (,T^)ee E and time horizon T. 

3 The Bow Graph 

In this section, we will define a so-called bow graph that is very similar to the one de- 
flned in [13]. Let us for the moment assume that all transit time functions are piecewise 
constant, non-decreasing, and left-continuous. This transit time function of arc e is de- 
noted by Tg . It is given by breakpoints 0 = xq < xi < • • • < X£ and corresponding 
transit times ti < • • • < r^. Flow entering arc e at rate x € (xi_i, x^] needs time 
to traverse arc e. Later we will use the fact that general transit time functions can be 
approximated by such step functions within arbitrary precision. 

The bow graph, denoted = {V^ ,E^), is defined on the same node set as G, 
i.e., := V, and is obtained by creating several copies of an arc, one for every 

possible transit time on this arc. Thus, arc e is replaced by i parallel bow arcs ai, . . . ,ag. 
The transit time of bow arc Oi is and its capacity is Xi, * = We will denote 

the set of bow arcs corresponding to arc e € E by E^ , and refer to E^ as the expansion 
of arc e. The cost coefficients of every arc a € E^ are identical to those of e, i.e., Ca,i ■= 
Ce,i, for i G K. 

3.1 A Relaxation of Inflow-Dependent Transit Times 

We will now discuss the relationship between flows over time with inflow-dependent 
transit times in G and flows over time in the bow graph . Any flow over time / in G 
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with inflow-dependent transit times and time horizon T can be interpreted as 

a flow over time in (with constant transit times) with the same time horizon T: 
If flow is entering arc e G E at time 6 with flow rate /e(0), then, in the bow graph, this 
flow is sent onto the bow arc a G representing the transit time r* (/e(0)). 

Unfortunately, an arbitrary flow over time in does not correspond to a flow 
over time / with inflow-dependent transit times (tDege in G- In addition, we have 
to require the following property: For every original arc e G E and at every point in 
time 9, the flow sends flow into at most one bow arc a G E^ . A flow over time 
in G^ fulfilling this property is called inflow-preserving. 

Observation 2. Every inflow-preserving flow over time in G^ with time horizon T 
corresponds to a flow over time f in G with inflow-dependent transit times (''"DeGB and 
time horizon T, and vice versa. 

Notice that the set of inflow-preserving flows over time is not convex. In particular, 
it is difficult to compute inflow-preserving flows directly. Therefore, we also consider 
a relaxed notion which can be interpreted as a convexiflcation of inflow-preserving 
flows: For any arc a G E^ , let \a{9) := fa{S) /ua denote the per capacity inflow rate 
into arc a at time 9. Then, a flow over time in G^ with time horizon T is called 
weakly inflow-preserving if J^aeEB ^a{9) < 1 for a\\ e G E and 9 G [0,T). Since 
every inflow-preserving flow over time is also weakly inflow-preserving, it follows from 
Observations 1 and 2 that weakly inflow-preserving flows over time in G^ constitute a 
relaxation of flows over time with inflow-dependent transit times in G: 

Observation 3. For every arc e G E, let r* : [0, Ug] — > and Tg : [0, Ug] K’*' be 

transit time functions on arc e such that is a step function with (x) < Tg{x) for 
all X G [0, Ug\. Then, every flow over time with inflow -dependent transit times (re)eGB 
and time horizon T in G yields a (weakly) inflow-preserving flow over time with time 
horizon T in G^. 

The basic idea of the approximation algorithms presented in this paper is to compute 
weakly inflow-preserving flows over time in an appropriate bow graph and turn these 
into flows over time in G with inflow-dependent transit times. The following lemma and 
its corollary make this approach work. Consider the expansion of a single arc e € FI to 
bow arcs E^ = {oi, . . . , ag}. 

Lemma 1. Let f^ be a weakly inflow-preserving flow over time with time horizon T 
in E^ and 5 > 0. Then, f^ can be turned into an inflow-preserving flow over time f^ 
in E^ such that every (infinitesimal) unit of flow in f^ reaches head(e) at most 6 time 
units later than it does in f^. 

Proof. For every bow arc Ui,i = 1, . . . , f, we set up a buffer bi in tail(e) for temporary 
storage of flow. The buffer bi is collecting all flow in which is about to be shipped 
through bow arc a^. It can output this flow in a first-in-first-out manner, i.e., flow units 
must enter and leave the buffer in the same order. Buffer bi has only two output modes. 
Either it is closed, then no flow is leaving the buffer, or it is open and flow is leaving the 
buffer at constant rate Ua^, immediately entering arc a^. In our modified solution /^, 
at every point in time at most one of the buffers bi, i = will be open. This 

guaranties that is inflow-preserving. 
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Fig. 1. Original flow rate on bow arc a and modified flow rate produced by buffering in 
tail(a). 



As above, let \a{0) := fa{S) /ua be the per capacity inflow rate of on arc a G 
at time 6. We partition the time horizon into intervals of length <5, where <5 := S/2. 
Let Xaj he the average per capacity inflow rate on arc a G during time interval 
[(j — 1 ) 5, j (5), j = 1, ... ,\T/S~\. We define the modified flow as follows: During 
the first 5-round, all buffers are closed. During each following 5-round, we open the 
buffers in a ‘round robin’ fashion. More precisely, during time interval [j 5, (j + 1 ) 5), 
we first open buffer bi for Aai,j5 time, then buffer &2 for Xa^jS time, and so on. Since 
is weakly inflow -preserving, X)i=i ^ai,j — 1 holds and the last buffer is closed 
again before the end of this 5-round. Figure 1 illustrates how the buffer changes the 
original inflow rate of a single bow arc a. 

We show that the buffers are never empty while they are open. Consider bow arc a^. 
During the interval [{j — 1 ) 5, j 5), the flow sends SXa^jUai units of flow into bow 
arc ai . This is exactly the amount of flow that the corresponding buffer bi is sending 
out during the succeeding interval [j 5, (j + 1 ) 5). Hence buffer bi is never emptied 
and, in particular, every unit of flow is delayed for at most 25 = 5 time. Note that 
throughout these modifications no flow is rerouted. We only make use of storage in 
nodes. Therefore, the cost of remains unchanged. □ 

For 5 > 0, we call a flow over time in 5-resting if, for every node v G 
y\{si, . . . , Sfc}, all flow arriving at v is stored there for at least 5 time units before 
it moves on. A weakly inflow-preserving flow over time in which is 5-resting 
can easily be interpreted as an inflow-preserving flow over time f^: Consider a single 
arc e G E and its expansion E^ . Applying Lemma 1, the flow over time restricted 
to E^ can be modified to an inflow-preserving flow over time such that every unit of 
flow is delayed by at most 5. The resting property of makes up for this delay and 
ensures that every such flow unit can continue its way on time. Applying Observation 2, 
the flow can then be interpreted as a flow over time / in G with inflow-dependent 
transit times {T/)e^E- 

Corollary 1. Let be a weakly inflow-preserving flow over time in G^ with time 
horizon T which is 5-resting. Then, can be turned into a flow over time / in G 
with inflow-dependent transit times and with the same time horizon and the 

same cost as f^. Moreover, the flow over time f is given by piecewise constant func- 
tions (/e)eG E such that the number of breakpoints of fe is bounded by 2 \E^\ \T /S']. 
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4 A (2 + e) -Approximation Algorithm for Quickest Flows 

In this section we present a fairly simple (2 + e) -approximation algorithm for the quick- 
est multicommodity flow problem with inflow-dependent transit times. The algorithm 
consists of the following three main steps. First, the original transit times {Te)e^E are 
replaced by lower step functions {Te)e(^E and the corresponding bow graph is con- 
structed. Then, an appropriately modified version of the (2 + e) -approximation algo- 
rithm presented in [4] is applied yielding a weakly inflow-preserving flow over time 
in . Finally, the output is turned into a feasible solution to the original problem. 
The bow graph G^ is defined in the first step according to step functions fulfilling 
the requirements stated in the following observation. We will later specify the param- 
eters S,T] > 0 such that the size of the resulting bow graph is polynomial in the input 
size and 1 /er. 

Observation 4. Let <5, 77 > 0. For every non-negative, non-decreasing, and left-continu- 
ous function T : [0, u] —>■ there exists a step function r'’ : [0, u] — > with 

(i) r'*(a:) < t{x) < (1 -f 77 ) 'r®(a:) -I- d for every x € [0, u], 

(ii) the number of breakpoints of is bounded by [log2+^(T(u)/<5)] -f 1. 

4.1 (2 -|- e) -Approximate Quickest Weakly Inflow-Preserving Flows 

Fleischer and Skutella [4] propose a (2 -f er) -approximation algorithm for the quick- 
est multicommodity flow problem with bounded cost and constant transit times. The 
method is based on an approximate length-bounded static flow computation. The same 
approach can be applied to the problem of finding a quickest weakly inflow-preserving 
multicommodity flow over time with bounded cost in the bow graph. 

Let f^ be an optimal solution to this problem with minimal time horizon T. As 
suggested in [4], we consider the static multicommodity flow in G^ which results 
from averaging the flow f^ on every arc a G over the time interval [0,T). As 
proven in [4], this static flow (i) satisfies a fraction of 1/T of the demands covered by 
the flow over time f^, (ii) has cost c{x^) = c{f^)/T, and (iii) is T-length-bounded. 
The latter property means that the flow of every commodity i G K can be decomposed 
into a sum of flows on s^-fi-paths such that the length r(P) := 
path P is at most T. Since f^ is weakly inflow-preserving, so is x^ , i.e., its per capacity 
flow values Aq := x^ fua, a G , satisfy J2aeE^ < 1 for every arc e G E. We 
refer to this property as property (iv). 

Any static flow x in G^ meeting requirements (i) - (iv) can be turned into a weakly 
inflow-preserving flow over time g in G^ meeting the same demands at the same 
cost as f^ within time 2T: Send flow into every Si-f^-path P given by the length- 
bounded path decomposition of x at the corresponding flow rate xp^i for exactly T time 
units; wait for at most another T time units until all flow has arrived at its destination. 
Since ga{d) /ua is always upper-bounded by Xajua, it follows from property (iv) that g 
is weakly-inflow preserving. Thus, 77 is a 2-approximate solution to the problem under 
consideration. 

Unfortunately, computing the T-length-bounded flow x is NP-hard, even for the 
special case of a single commodity [9] . Yet, as discussed in [4] , the T -length-bounded 
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multicommodity flow problem can be approximated within arbitrary precision in poly- 
nomial time by slightly relaxing the length bound T. It is easy to generalize this obser- 
vation to length-bounded, weakly inflow-preserving flows. This Anally yields a (2 + e:)- 
approximate solution. 

Lemma 2. Assume that there exists a weakly inflow-preserving multicommodity flow 
over time with time horizon T and cost at most C. Then, for every e > 0, a weakly 
inflow-preserving multicommodity flow over time with time horizon at most (2 -\- e)T 
and cost at most C can be computed in time polynomial in the input size and 1 /e. 

If all transit time functions Te are constant, the (2 + e) -approximation algorithm in 
Lemma 2 and the one presented in [4] basically coincide. In [4], an example is given 
which shows that the performance guarantee of both algorithms is not better than 2. 

4.2 (2 + e) -Approximate Quickest Flows with Inflow-Dependent Transit Times 

So far, we have presented an algorithm to compute a (2 + e) -approximate solution 
to the quickest multicommodity flow problem In the relaxed model of weakly inflow- 
preserving flows over time. Such a solution has a simple structure, namely it is generated 
from a path decomposition of a static flow in the bow graph. We will use this property 
to turn such a flow into a solution to the original problem. Throughout this modification 
we will make sure that the time horizon only increases by a small factor. 

Let be a weakly inflow-preserving multicommodity flow over time with time 

horizon in , which is generated from a static flow as described in the last 

section. In particular, is weakly inflow-preserving and has a length-bounded path 
decomposition. Let Vi denote the set of Si-f^-paths from the length-bounded path de- 
composition of x^ and V := 

Lemma 3. The flow over time can be turned into a flow over time f in G with 
inflow-dependent transit times {Te)eeE cind time horizon T, where T is bounded from 
above by {1 rj)T^ + 2n5. 

We are now ready to state the main result of this section. 

Theorem 1. For the quickest multicommodity flow problem with inflow-dependent tran- 
sit times and bounded cost, there exists a polynomial time algorithm that, for any e > 0, 
finds a solution of the same cost as optimal with time horizon at most 2 -\- e times the 
optimal time horizon T*. 

5 An FPTAS for Quickest Flows 

In this section we present an FPTAS for the quickest multicommodity flow problem 
with inflow-dependent transit times and bounded cost. We use ideas similar to the ones 
employed in [5] for the problem with fixed transit times. The FPTAS is based on a static 
weakly inflow-preserving flow computation in a condensed time-expanded bow graph. 

Theorem 2. There is an FPTAS for the quickest multicommodity flow problem with 
inflow-dependent transit times and bounded cost. 
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5.1 The Algorithm 

To state our algorithm and prove its correctness we define the following two bow 
graphs: Given G = E) with transit time functions {Te)e^E and a time horizon T, 
let denote the lower bow graph constructed from the lower step functions (x) := 
\Tf.{x)/A\ A, for e G E, X G [0,Ue]- Here, A := e^Tjn for a given small con- 
stant e > 0 (we assume that n/e^ is integral snch that T is a multiple of A). That 
is, Te(x) is rounded down to the nearest multiple of A. By choice of A, the size of G^ 
is polynomially bounded since we can delete all arcs with transit times greater than T. 
The second graph is the 2A-lengthened bow graph, denoted by G^\ which is con- 
structed from G^ by lengthening the transit time of each arc by 2 A. The corresponding 
transit time step functions are given by (x) := (x) + 2 A, for e G E, x G ^,Ue\. 

Let the fan graph G^ = (V^,E^) be the Z\-condensed time-expansion of gtt 
for time horizon T (see Section 2). Each arc e G E is represented in the bow graph 
GH by its expansion Elf Thus, the fan graph contains, for each time 6 G S := 
{0, A, . . . ,T — A}, a ‘fan’ of arcs E^ (0) := {a{6) : a G 9 + Ta € S'}, where 
a{9) = {v{9),w{9 + Ta)). For a static flow X in G^, we define Aa(0) := Xa(e)/ua(e) to 
be the per capacity inflow value on arc a{9) G E^ . With these definitions, the concept 
of (weakly) inflow-preserving flows directly carries over to static flows in G^ . More- 
over, the problem of computing a weakly inflow-preserving static flow in G^ can easily 
be formulated as a linear program. Take a standard network flow formulation and add 
an extra constraint for each fan in G^ . In particular, such a flow can be computed in 
polynomial time. Note that any (weakly) inflow-preserving static flow in G^ directly 
corresponds to a (weakly) inflow-preserving^Zow over time in G^ , as described in Sec- 
tion 1. 

Let T* denote the time horizon of a quickest flow with inflow-dependent transit 
times in G. We can now give an overview of our algorithm: 



FPTAS FOR Quickest Flows with Inflow-Dependent Transit Times 

1. Guess T such that T* < T < {I + 0{e))T* . This is done via geometric mean 
binary search, starting with good upper and lower bounds, obtained, e.g., with help 
of the (2 + e) -approximation in Section 4. 

2. Construct the fan graph G^ for time horizon T and compnte a weakly inflow- 
preserving static multicommodity flow satisfying all demands at minimum cost. 

3. Interpret this static flow as a weakly inflow-preserving^Zow over time in G^ . Mod- 
ify this flow to make it inflow-preserving and, from this, derive a flow over time 
in G with inflow-dependent transit times and time horizon at most T. 



We now proceed as follows: First we discuss issues related to the running time of 
the algorithm and detail how step 3 is implemented. Then, in the next section, we prove 
that a static flow in G^ with the properties claimed in step 2 actually exists. 

The upper and lower bounds obtained from the (2 + e) -approximation in step 1 
are within a constant factor of each other. Thus, the estimate T can be found within 
G(log(l/£)) geometric mean binary search steps. The fan graph G^ constructed in 
step 2 contains 0{nf je^) nodes and 0{mn^ je'^) arcs; note that each fan contains 
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arcs, potentially one for each layer of . Therefore, the static flow in can 
be computed in polynomial time. We now go into the details of step 3. As mentioned 
before, interpreting the static flow in G^ as a weakly inflow-preserving flow over time 
in gtt 

is done in the canonical way, as described in Section 1. If we now shorten all 
arcs of G^^ by Z\ (we refer to the resulting bow graph as G^), we obtain a weakly 
inflow-preserving flow over time in G Which is Z\-resting. Applying Corollary 1, we 
derive an inflow-preserving flow over time in GG Finally, by Observation 1, we get a 
flow over time in G with inflow-dependent transit times (re)eGB with time horizon at 
most T. Clearly, step 3 can be done in polynomial time. 



5.2 Transforming a Flow over Time in G to a Static Flow in 

In this section we prove that our algorithm actually is an FPTAS by showing that a 
feasible flow as claimed in step 2 exists. To this end, we transform a quickest flow in G 
with inflow-dependent transit times to a weakly inflow-preserving static flow in G^ and 
thereby lengthen the time horizon by at most a factor of 1 + G(e). This transformation 
is done in several steps which are illustrated in the following diagram: 



infl.-dep. flow 


o 


infl.-pres. flow 




weakly infl.-pres. 




weakly infl.-pres. 


over time in 


— > 


over time 


> 


flow over time 


> 


static flow in 


G, time 


in , time 


in time 


G^, time 


horizon T* 




horizon T* 




horizon < T 




horizon < T 

















With Observation 3, step O is easy to see. For step ©, flow in G^^ is mapped to G^ as 
described in Section 1: the total flow entering arc a G in the interval [0, 0 + Z\) is 
assigned to a(0) G E^, for 9 G S. Clearly, if the flow was (weakly) inflow-preserving 
in G^\ it will be weakly inflow-preserving in G^, too. Step @ is the most interesting 
but also the most intricate one. It is done similarly to [5] by carefully averaging flow 
to derive an ‘almost feasible’ flow, then subsequently sending less to obtain a feasible 
flow and finally increasing the time horizon to meet the demands (we refer to [5] for 
details). We can adopt this method since the transit times in bow graphs G^ and G^^ 
are constant. However, in contrast to [5], our flows must have the additional property of 
being weakly inflow-preserving. 

Lemma 4. A (weakly) inflow-preserving flow over time f in G^ with time horizon T* 
can be transformed into a weakly inflow-preserving flow over time in G^^ with time 
horizon at most T := (1 + 0{e))T* and the same cost as /. 

This concludes the proof of Theorem 2. 



6 Complexity 

Theorem 3. The quickest s-t-flow problem with inflow-dependent transit times, with or 
without storage of flow at intermediate nodes, is NP-hard in the strong sense. 

The proof uses a reduction from the well-known NP-complete problem 3-PARTITION. 
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Abstract. We study the complexity of bounded variants of graph prob- 
lems, mainly the problem of k-Dimensional Matching (k-DM), namely, 
the problem of hnding a maximal matching in a k-partite k-uniform bal- 
anced hyper-graph. We prove that k-DM cannot be efficiently approxi- 
mated to within a factor of O(p^) unless P — NP. This improves the 



previous factor of 



20(vTirfc) 



by Trevisan [TreOl]. For low k values we prove 



NP-hardness factors of || — e, |§~e and e for 4-DM, 5-DM and 6-DM 
respectively. These results extend to the problem of fc-Set-Packing and 
the problem of Maximum Independent-Set in {k + l)-claw-free graphs. 



1 Introduction 

Bounded variants of optimization problems are often easier to approximate than 
the general, unbounded problems. The Independent-Set problem illustrates this 
well: it cannot be approximated to within unless P = NP [Has99], 

nevertheless, once the input graph has a bounded degree d, much better approx- 
imations exist (e.g, a ^ approximation by [Vis96]). 

We next examine some bounded variants of the set-packing (SP) problem 
and try to illustrate the connection between the bounded parameters (e.g, sets 
size, occurrences of elements) and the complexity of the bounded problem. 

In the problem of SP, the input is a family of sets Si, ..., Sn, and the objective 
is to find a maximal packing, namely a maximal number of pairwise disjoint sets 
from the family. This problem is often phrased in terms of Hyper-graphs: we have 
a vertex Vx for each element x and a hyper-edge eg for each set S of the family 
(containing all vertices Vx which correspond the elements x in the set S). The 
objective is to find a maximal matching. Alternatively one can formulate this 
problem using the dual-graph: a vertex vg for each set S and a hyper-edge Cx for 
each element {vg is contained in all edges ex such that x € S). The objective is 
to find a maximal independent set (namely, a maximal number of vertices, such 
that no two of them are contained in the same edge). 

* Research supported in part by the Fund for Basic Research Administered by the 
Israel Academy of Sciences, and a Bikura grant. 

S. Arora et al. (Eds.): APPROX 2003-1- RANDOM 2003, LNCS 2764, pp. 83-97, 2003. 
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The general problem of SP has been extensively studied (for example [Wig83], 
[BYM84], [BH92], [Has99]). Quite tight approximation algorithms and inapprox- 
imability factors are known for this problem. Hastad [Has99] proved that Set- 
Packing cannot be approximated to within unless NP C ZPP (for 

every e > 0, where N is the number of sets). The best approximation algorithm 
achieves an approximation ratio of [BH92]. In contrast, the case of 

bounded variants of this problem seems to be of a different nature. 

1.1 Bounded Variants of Set-Packing 

For bounded variant it seems natural to think of SP using hyper-graph notions. 
One may think of two natural bounds: the size of the edges (size of the sets) 
and the degree of the vertices (number of occurrences of each element). For 
example, k-Set-Packing (k-SP) is this problem where the size of the hyper-edges 
is bounded by fc. If we also bound the degree of the vertices by two this becomes 
the problem of maximum independent-set in k bounded degree graphs (k-IS) 
(recall the dual-graph defined above). 

Another natural bound is the colorability of the input graph. Consider the 
problem of 3-Dimensional Matching (3-DM). It is a variant of 3 — SP where 
the vertices of the input hyper-graph are a union of three disjoint sets, V = 
Vi l±) V 2 W V 3 , and each hyper-edge contains exactly one vertex from each set, 
namely, A C lA x V 2 x V 3 . In other words, the vertices of the hyper-graph can be 
colored using 3 colors, so that each hyper-edge does not contain the same color 
twice. A graph having this property is called 3-strongly-colorable (in general - k- 
strongly-colorable). Thus the color-bounded version of k-SP, namely the problem 
of k-DM, is 

Definition 1 (k-DM). k- Dimensional Matching 

Input: A k-uniform k-strongly colorable hyper-graph H = (V^, ..., , E). 

Problem: Find a matching of maximal size in H . 

These bounded variants of SP are known to admit approximation algorithms 
better than their general versions, the quality of the approximation being a func- 
tion of the bounds. An extensive body of algorithmic work has been devoted to 
these restricted problems (for example, [HS89]), but matching inapproximability 
results have only recently been explored (notably by Trevisan [TreOl]). 

With some abuse of notations, one can say that hardness of approximation 
factor of SP is a monotonous increasing function in each of the bounded pa- 
rameters: the edges size, the vertices degree and the colorability (of edges and 
vertices). For example, inapproximability factor for graphs of degree bounded by 
3 holds for graphs with degree bounded by 4. We next try to overview what is 
known regarding the complexity of this problem as a function of these bounds. 

1.2 Previous Results 

2-DM is known to be solvable in polynomial time, say by a reduction to network 
flow problems [Pap94]. Polynomial time algorithms are also known for graphs 
that are not bipartite [Edm65]. 
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In contrast, for all fc > 3, k-DM is NP-hard [Kar72, Pap94]. Furthermore, for 
fe = 3, the problem is known to be APX-hard [Kan91]. 

For large k values, we are usually interested in the asymptotic dependence of 
the approximation ratio (and inapproximability factor) on k. Currently, the best 
polynomial time approximation algorithm for k-SP achieves an approximation 
ratio of | [HS89]. This is, to date, the best approximation algorithm for k-DM 
as well. 

Alon et al [AFWD95] proved that for suitably large k, fc-IS is NP-hard to 
approximate to within — £ for some c > 0. This was later improved to the cur- 
rently best asymptotical inapproximability factor [TreOl] of All hard- 

ness factors for fc-IS hold in fact for k-DM as well (by a simple reduction). 

The best known approximation algorithm for fe-IS achieves an approximation 
ratio of 0(fcloglogfc/logfc) [Vis96]. For low bound values instances of fc-IS, the 
best approximation algorithm achieves an approximation ratio of (fc -I- 3)/5 for 
fc > 3 (see [BF94, BF95]). For fc = 3,5 [BK99, BK03] showed inapproximability 
factors of and ||y respectively. 

1.3 Our Contribution 

We improve the inapproximability factor for the variant k-DM, and show: 

Theorem 1 (Asymptotic Hardness). It is NP-hard to approximate k-DM to 
within O (j^) 

In addition, we show inapproximability factors for 4-DM, 5-DM and 6-DM: 

Theorem 2 (Hardness for Low Bound Values). For every e > Q it is NP- 

hard to approximate 4-DM, 5-DM and 6-DM to within ff ~ |§ ~ £ o.'nd |§ — £ 

respectively. 

These results extend to k-SP and Independent-Set in fc -|- I-claw-free graphs 
(k -\- 1-ISCFG) (see [Hal98] for definition of fc -I- 1-ISCFG and reduction from 
k-SP). They do not hold, however, for fc-IS. The table below summarizes known 
upper and lower bounds. 

Recently there have been noteworthy developments namely by [CC02, BK03, 
CC03] . Inapproximability factors of || — £ for 3-IS and 3-DM and of || — £ for 
4-IS and 4-DM were shown by [CC03]. 

1.4 Outline 

Some preliminaries are given in section 2. Section 3 presents the notion of hyper- 
graph-dispersers. Section 4 contains the proof of the asymptotic hardness of 
approximation for k-SP. Section 5 extends the proof to hold for k-DM. The proof 
for the low- values inapproximability factors will be given in the full version. The 
existence of a good hyper-disperser is proved in section 6. The optimality of 
its parameter is shown in section 6.1. Section 7 contains a discussion on the 
implications of our results, the techniques used and some open problems. 
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Problem 


Approximation Ratio 


Prev. Inapproximability 


Our Inapproximability 


k-DM 

k + 1-ISCFG 
k-SP 


1 [HS89] 


[TreOl] 


Oijh) 


4-DM, 4-SP 
5-ISCFG 


2 [HS89] 


II -£ [BK99] 




5-DM, 5-SP 
6-ISCFG 


1 [HS89] 


§-£ [BK99] 


29 ^ 


6-DM, 6-SP 
7-ISCFG 


I [HS89] 


§-e [BK99] 


22 ^ 



Table 1. Approximation ratios versus inapproximability factors for k-DM and 
related problems 



2 Preliminaries 

In order to prove inapproximability of a maximization problem, one usually 
defines a corresponding gap problem. 

Definition 2 (Gap problems). Let A be a maximization problem. gap-A- 
[a, b] is the following decision problem: 

Given an input instance, decide whether 

— there exists a solution of fractional size at least b, or 

— every solution of the given instance is of fractional size smaller than a. 

If the size of the solution resides between these values, then any output suffices. 

Clearly, for any maximization problem, if gap-A-[a, 6] is NP-hard, than it is 
NP-hard to approximate A to within any factor smaller than 
Our main result in this paper is derived by a reduction from the following prob- 
lem. 

Definition 3 (Linear Equations). MAX-3-LIN-q is the following optimiza- 
tion problem: 

Input: A set of linear equations over GF{q), each depending on 3 variables. 
Problem: Find an assignment that satisfies the maximum number of equations. 

The following central theorem stemmed from extensive research that formulated 
in the celebrated PCP theorem (see [ALM+92, AS92]): 

Theorem 3 (Hastad [Has97]). gap-MAX-3-LIN-q-[^ + e ,1 — e ] is NP-Hard 
for every q gN and e > 0. Furthermore, the result holds for instances of MAX- 
3-LIN-q in which the number of occurrences of each variable is a constant (de- 
pending on e only), chosen from two possible values, and in which no variable 
appears more than once in a single equation. 
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We denote an instance of MAX-3-LIN-q by ^ (fm}- ^ is over the set of 

variables X = {xi, Let be the set of all equations in depending 

on X. Denote by Sat{(p, A) the set of all equation of satisfied by an assignment 
A. If A is an assignment to an equation ip G ^(x), we denote by the 

corresponding assignment to x. 

We next explain the reduction from Linear equations to out problem. The 
reduction gives an inapproximability factor for k-SP. We later amend it to hold 
for k-DM too. 



3 Hyper Dispersers 

The following definition is a generalization of disperser graphs. For definitions 
and results regarding dispersers see [RTSOO]. 

Definition 4 ((g, <5)-Hyper-Graph Edge-Disperser). We call a hyper graph 
H = {V,E) a {q,S)~ Hyper- Graph Edge- Disperser if there exists a partition of its 
edges: 

E = El m ...\a Eq , \El\ = ... = \Eq\ 

such that every large matching M of H is (almost) concentrated in one part of 
the edges. Formally, there exists i so that 

\M\E,\ < S\E\ 



Lemma 1. For every q > 1 and t > 1 there exists a hyper-graph H = (V,E) 
such that 

— V = [t]x [d\, whereas d = 0{qlnq). 

— H is {q,-^)-hyper-edge-disperser 

— H is d uniform, d- strongly- colorable. 

— H is q regular, q-strongly-edge-colorable. 

We denote this graph by {t, q) — V. 

This lemma is in section 6. 



4 Proof of the Asymptotic Inapproximability Factor for 
k-SP 



This section provides a deterministic polynomial time reduction from MAX-3- 
LIN-q to k-SP. 
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4.1 The Construction 



Let <P = be an instance of MAX-3-LIN-q over the sets of variables 

X and y, where each variable x € X and y G Y occurs a constant number 
of times cx and cy respectively (recall Theorem 3). We now describe how to 
deterministically construct, in polynomial time, an instance of k-SP - the hyper- 
graph H,p = (V,E). 

Let T>x be a (cx,q) — and Vy be a (cy,q) — V (which exist by lemma 1). 
For every variable x G X (and y G Y) we have a copy of T>x (or T>y of T>y). 
The vertices of H,p are the union of the vertices of all these hyper-disperses. 
Formally, 

P = A X [cx] X [g] U y X [cy] x [g] 



namely. 



V = ] X G XUY,y> G <P{x),i G [d]} 



The Edges of E[^ . We have an edge for each equation ip G and a satisfying 
assignment to it. Consider an equation ip = x+y+ z = a mod g, and a satisfying 
assignment A to that equation (note that there are g^ such assignments, as 
assigning the first two variables, determines the third). The corresponding edge, 
e^^A, is composed of three edges, one from the hyper-graph T>x, one from Vy 
and the last from .Formally: 

Where is the restrictions of the assignment A to the variable x, and 
is the edge e[p,A\y.] of V^ (and similarly for y and z). The edges of H 4 , are 

E = I p G A is a, satisfying assignment to 1^} 

Clearly, the cardinality of e^p^A is 3d (and note that each of the three composing 
edges participates in creating g edges). This concludes the construction. 

Notice that the construction is indeed deterministic, as each variable occurs 
a constant number of times (see Theorem 3). Hence, the size of Vx and Vy 
is constant and its existence (see lemma 1) suffices, as one can enumerate all 
possible hyper-graphs, and verify their properties. 

Claim. [Completeness] If there is an assignment to ^ which satisfies 1 — e of its 
equations, then there is a matching in id<i> of size l-^l- 

Proof. Let A be an assignment that satisfies 1 — e of the equations. Consider the 
matching XI C E comprised of all edges corresponding to A, namely 

M = {e^^AiA I P e Sat{T>, A)} 

Trivially, |M| = I^Ij eis we took one edge corresponding to each satisfied 

equation. These edges are indeed a matching since for each variable, only edges 
corresponding to a single assignment to that variable are taken. 



On the Complexity of Approximating fc-Dimensional Matching 



89 



Lemma 2. [Soundness] If every assignment to satisfies at most ^ + £ fraction 
of its equations, then every matching in H,p is of size O . 

Proof. Denote by the edges of corresponding to equations (p containing 
the variable x, namely, 

I P € ^{x),e,p^A S E} 

Denote by E^=a the subset of Ex corresponding to an assignment of a to x, that 
is, 

Ex— a — I ^^p,A C Ex, -d|a; — d} 

Let M be a matching of maximal size in H,p. Let Amaj be the most popular 
assignment. That is, for every x € X U Y choosing the assignment of x to be 
such that it corresponds to maximal number of edges. Formally, choose 

Amaj(x) G [q] s.t. \Ex=a H M\ is maximized 

Let Mmaj be the set of edges in M that agree with Amaj, and Mmin be all the 
other edges in M, namely 



Mmaj — {_S-tf>,Amaj} 

Mm^n =M\ Mmaj 

As \Sat{<P,Amaj)\ < ^ + £, we have \Mmaj\ < + e)^- 

From the disperser-properties of T>x and T>y (derived from lemma 1) we know 
that for every x G X UY 

\Mm^n n Ex=a\ < \e{Vx) 

(^) 

This means that 

\Mm^nPEx=a\<\\Ex\ 

apAmaj (ic) 

as every edge of T>x is a subset of q hyper edges in Ex, but only one of such q 
edges can be taken to M as they share vertices (recall that M is a matching). 
Therefore, 

\Mm^n\ < Y. 1^™" ^ ^ 1^-1 = 4l^l 

x&XUY,a=pA,„aj{x) ® xeXUY ® 

and thus 

|Af| = \Mmin\ + \Mmaj\ ^ (^ + £)|A'| 



■ “2" — ^ 



is NP- 



By claim 4.1 and lemma 2 we showed that Gap-k-SP- 
hard. Since each edge is of size fc = 3d = 0(glog q) it is NP-hard to approximate 

In k 
k 



k-SP to within O(ii^). 
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5 Extending the Proof for k-DM 

The proof for k-DM follows the steps of the proof for k-SP. The difference being 
that we use three dispersers for each variable (instead of one) - a different dis- 
perser for each location in the equations. Denote by 1) the subset of ^(x) 
where x is the /’th variable in the equation (clearly I G [3]). Note that w.l.o.g. 
we may assume that for every x G X, <l>{x, 1) = <l>{x, 2) = <l>{x, 3) (as we can take 
three copies of each equation, and shift the location of the variables). 

Let T>x = (cjc/3, q)—V and Dy = (cy/3, q) — V (as stated in lemma 1). For 
every variable x G X {or y GY) and position I G [3], we have a copy Pa,,; of T>x 
(or Vy^i oiVy). 



V = X X V{Vx) X [3] U y X V{Vy) X [3] 

namely, 

V = \ X G X\jY,tf G <P{x),i G [d\} 

where the index i G [(?] is given by a strong-coloring of the edges with q colors 
(recall that such a coloring exists as (t, q) — V is g-strongly colorable). 

The Edges of H,p . We have an edge for each equation (p G T> and a satisfying 
assignment to it. Consider an equation tp = x+y+ z = a mod q, and a satisfying 
assignment A to that equation. The corresponding edge, is composed of 

three edges, one from the hyper-graph T>xp, one from 2 and the last from 
2^2, 3 - Formally: 

Where is the edge e[p,A\y] of ^v,v,A\y is the edge e[p,A\y] of 

VyA and Cz,ip,AA is the edge e[p,A\z\ of T’z.s- The edges of are 

E = {eyi^A I p G'P.Ais & satisfying assignment to p} 

This concludes the construction for k-DM. We next show that the graph 
constructed is indeed a k-DM instance: 

Proposition 1. is Hd-strongly-colorable. 

Proof. We show how to partition V into 3d independent sets of equal size. Let 
the sets be Ppi whereas i G [d] and I G [3]: 

Pi,i = {vx,v,i I X G X\JY,p G <P{x, 1)} 

Pip is clearly a partition of the vertices, as each vertex belongs to a single part. 
We now explain why each part is an independent set. Let Pip be an arbitrary 
part, and let ey,^A G E he an arbitrary edge, where p = x + y + z = a mod q: 

&p,A = ex^ip^A[ip\A ^y,p,A[p\\y ^z,V,A[p\\z 

PipECip^A may contain vertices corresponding only to one of the variables x, y, z, 
since it contains variables corresponding to a single location (first, second or 
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third). Let that variable be, w.l.o.g, x. The edge contains exactly one 

vertex from each of the d parts, as the graph is d-partite. Therefore, the 
set Pi^i n contains exactly one vertex. Since |P;y fl €^^a\ = 1 for every edge 
and every set Pi^i, the graph is 3d-partite-balanced. 

The completeness claim for k-SP (claim 4.1) holds here too. The soundness 
lemma for k-SP holds with minor changes: 

Lemma 3. [Soundness] If every assignment to satisfies at most ~ + £ fraction 
of its equations, then every matching in G is of size O . 

Proof. We repeat the soundness proof of k-SP but the definition of the most- 
popular assignment is slightly different, and takes into account the three different 
dispersers per variable. 

Denote by the edges of corresponding to equations ip containing the 
variable x in location I, namely, 

Px.i = {e^,A I P e d>(x, l),A& [q^]} 

Denote by Ex=a,i the subset of E^g corresponding to an assignment of a to x, 
that is, 

Ex=a,i = {e^,A I P € ^(x, l),A[p]\x = a} 

Let M be a matching of maximal size. Let Amaj be the most popular of most 
popular assignment. That is, for every x € X U F choose the location (of equa- 
tions of edges of M) in which x appears maximal number of time : 

l{x) G [3] s.t. \E^ n M| is maximized (1) 

Then choose an assignment for x such that it corresponds to maximal number 
of those edges. Formally, choose 

Amaj{x) G [g] s.t. f(„.) n M\ is maximized 

As before, let Mmaj be the set of edges in M that agree with Amaj, and Mmin 
be all the other edges in M, namely 

Mmaj = 

Mm^n =AI\ Almaj 

For the exact same reasons as in the k-SP proof, we have 

\Mmaj\<{^+e)^-^ ( 2 ) 

and for every x, 

\Mmin n 



(3) 
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Therefore, 






by (1) we have 



<Y.\Mma,r^E^,\+ Y. \^min ^ ^x—a,l\ 

X,l X^l,a^Amaj{x) 

< 3 • ^ \Mmaj n £^ 2 , [(j.) I + 3 • Y, \Mmin H \ 

a; x,a^Ama.j{x) 

< 3 • \Mmaj\ + 3 • ^ \Mmin H 

X,a^Amaj{x) 



thus by (2) and (3) 






= (^ + 3s)|£| 

r 



By claim 4.1 and lemma 3 we showed that Gap-k-DM- 
hard, thus it is NP-hard to approximate k-DM to within 



+ 3e, ^ — e 

q'^ ‘ 



is NP- 



6 Hyper-Dispersers 



In this section, we prove lemma 1. As stated before, these are generalizations 
of disperser graphs. In section 6.1, we prove that these are the best (up to a 
constant) parameters for a hyper-disperser one can hope to achieve. 

Lemma 1 For every q > \ and t > 1 there exists a hyper-graph H = (V,E) 
such that 



— V = [t]x [d\, whereas d = 0{qlnq). 

— El is {q, -^) -hyper-edge-disperser 

— El is d uniform, d- strongly- colorable. 

— H is q regular, q-strongly-edge-colorahle. 

We denote this graph by {t, q) — V. 



Proof. Let 



V=[t]x [d\ 



and denote Vi = [t] x {i}. 

We next randomly construct the edges of the hyper-graph, so that it is d-uniform, 
g-regular. Let St be all permutation over t elements, and let 



£iid2 St , (ii,*2) G [q] X [d\ 



On the Complexity of Approximating fc-Dimensional Matching 



93 



(that is, qd permutations, chosen uniformly from St). Define 






(7T,,2(*),2) (7T,,rf(i),d) } 



( 4 ) 



and let 

E = {e[i,j] I (*, j) € [i] x [q]} 

Hence \E\ = tq. Define a partition of the edges as follows: Ej = {e[i,j] | i G [f]}. 
Thus |i?i| = ... = \Eq\ = t and each set of edges Ej covers every vertex exactly 
once. Therefore, H is q strongly-edge-colorable. On the other hand, every edge 
contains exactly one vertex from each set of vertices Vi. Thus is d-strongly- 
colorable. 

We next show that with high probability El has the disperser property, 
namely, every matching M oi El is concentrated on a single part of the edges, 
except for maybe -^\E\ = | edges of M. Denote by P the probability that H 
does not have the disperser property. Let A4 be the family of all subsets MCE 
of interest, that is, a family (of not concentrated subsets of edges) that ought 
to be inspected in order to determine whether H has the disperser property, 
namely, 

M = {M \ M CE,\M\ = -+ ^,3i,\M\E,\ = -} 

q q q 

(note that if a set M is a matching, so is any subset of M, hence it suffices to 
check H for all subsets M G JVl). Denote by Pr[M] the probability (over the 
random choice of H) that M is a matching. By union bound. 



P = Pr[3M G Ad, M is a matching ] < 

H 



< ^ Pr[M] < |Ad|Pr[M] (5) 

MeM 

where M G Ad is the set which maximizes Pr[M]. Clearly, 

|Ad| < q(^^^ j < q{eq^)i{eq^)^ < (eq)f (6) 

We next bound Pr[M]. Let Mi = M C Ei. Let Bi^j be the event that the sets of 
edges Mi and Mj do not share a vertex, and Ai = Cj^iBij. Then 



Pr[M] = Pr 






n?r 



A. I f| A, 



Note however, that the event Ai is independent of the event A; as Ai is 
determined by (the independently chosen permutations) {dlij | j G [d]}, whereas 
A[ is determined by the permutations {Elij \ I < i,j £ [d]}. Thus 



Pr[M] =[]Pr[A,] 

i 



( 7 ) 
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Let Cij be the event that there is no collision (common vertex) of Mi and 
- M; on the subset of vertices Vj (clearly Ai = Hence, as for 

ji ^ J2, C'z.ji and are determined by independent sets of permutations 
(recall (4)) we have 

Pr[A,]= Pr[Q,j] = (Pr[Q,i])'^ < (l-lM)<i| 

where the sum in the exponent of the rightmost expression is by assuming no 
collisions between edges of ^3 (which is implied by Thus 

by equation (7) we have ( as 1 — x < ): 



Pr[M] < Y[ 

i 




^^j<i \^j\ 






Under the constraint that M G M the sum X^i=2(l-^»l |T7j|) is minimized 

for IM2I = IM3I = ^ hence 



dt 

Pr[M] < e 

Therefore by equations (5), (6), (8), 



( 8 ) 



P < (eg) 1 e ^ 

3t dt 

Any d which guarantees that (eg) ~e ^ <C 1 suffices (for example d > 20g In g) 
as P < 1, thus there exists H with the disperser properties. 

6.1 Optimality of Hyper-Disperser Construction 

We now turn to see why the hyper-disperser from lemma 1 has optimal param- 
eters. We base our observation on a lemma from [RTSOO]: 

Definition 5. A bipartite graph G = (Vi,V 2 ,P) is called a 5-disperser if for 
every U\ QV\,U 2 Q ¥ 2 , |t/i|, IC/2I > the subset Ui U U 2 is not an 

independent set. 

Lemma 4. Every bipartite d-regular ^-disperser must satisfy d = f2{klnk). 



Proposition 2. Every d-uniform q-strongly-edge-colorable q-regular d-strongly 
colorable (g, -^) -hyper-edge-disperser must satisfy d = I7(glng). 

Proof. We prove that in case there exists such a hyper-graph which satisfies d = 
o(glng), then there exists a bipartite o(gln g)-regular ^-disperser, in contrast to 
lemma 4. We transform a d-partite d-uniform q-regular q-strongly-edge colorable 
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(g, ^)-hyper-disperser H = (Vff, i?i, i? 2 , -Eg) into a bipartite d-regiilar i- 
disperser G = {V\, V 2 ,Ec) in the following way. Let 

Vi = El 
= E2 

Eg = {(ei, 62) I ei n 62 yf (/>} 

Obviously G is a bipartite d-regular graph (we allow multi-edges). In addition, 
suppose two sets of fractional sizes: 

5l = ^^1,^2= i^2 

q q 

are an independent set in G. Then the corresponding sets of edges in iJ are 
disjoint and are of fractional size thus contradicting the fact that iJ is a 
((?, ^)-hyper-disperser. 

7 Discussion 

An interesting property of our construction (for both asymptotic and low bound 
values results) is the almost perfect completeness. This property refers to the 
fact that the matching proved to exist in the completeness claim 4.1 is an almost 
perfect matching, that is, it covers 1 — e: of the vertices. Knowing the location of a 
gap is interesting by itself and may proof useful (in particular if it is extreme on 
either the completeness or the soundness parameters, see for example [Pet94]). 
In fact, applying our reduction on other PCP variants instead of Max-3-Lin-q 
(e.g. parallel repetition of 3-SAT) yields perfect completeness for k-DM (but 
with weaker hardness factors). 

The ratio between the asymptotic inapproximability factor presented herein 
for k-DM and k-SP, and the tightest approximation algorithm known, was re- 
duced to O(lnfc). The open question of where in the range, from | to O(j^) is 
the approximability threshold is interesting by itself, as well as its implications 
to the difference between k-DM and k-IS. The current asymptotic inapproxima- 
bility factor of O(j^) for k-DM approaches the tightest approximation ratio 
known for k-IS, namely 0{k log log fc/ log k) [Vis96] . Thus, a small improvement 
in either the approximation ratio or the inapproximability factor will show these 
problems to be of inherently different complexity. 

An improvement in the low bound values hardness factor for k-DM may also 
separate these problems. The tightest known approximation algorithm for low 
bound values of k-IS achieves an approximation ratio of (fc -|- 3)/5 for fc > 3 
[BF94, BF95]. Thus, improving the low bound values factors up to f + e for 
3-DM or 1 + 6 for 4-DM, suffices for separating these problems. 
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Abstract. In this paper we consider the classic problem of finding the 
market equilibrium prices under linear utility functions. A notion of ap- 
proximate market equilibrium was proposed by Deng, Papadimitriou and 
Safra [5]. Using this notion, we present the first fully polynomial-time ap- 
proximation scheme for finding a market equilibrium price vector. The 
main tool in our algorithm is the polynomial-time algorithm of Devanur 
et al. [6] for a variant of the problem in which there is a clear demarca- 
tion between buyers and sellers. Their algorithm is used as a subroutine 
in our algorithm. 



1 Introduction 

The behavior of a complex marketplace with multiple goods, buyers, and sellers, 
can only be understood by analyzing the system in its entirety. In practice, such 
markets tend toward a delicate balance of supply and demand as determined 
by the agents’ fortunes and utilities. The study of this equilibrium situation is 
known as general equilibrium theory, and was first formulated by Leon Walras 
in 1874 [12]. In the Walrasian model, the market consists of a set of agents, each 
with an initial endowment of goods, and a function describing the utility each one 
will derive from any allocation. The initial allocation could be sub-optimal, and 
the task of exchanging goods to mutually increase the utilities might be fairly 
complicated. A functioning market accomplishes this exchange by determining 
appropriate prices for the goods. Given these prices, all agents independently 
maximize their own utility by selling their endowments and buying the best 
bundle of goods they can afford. This new allocation will be an equilibrium 
allocation if the total demand for every good equals its supply. The prices that 
induce this equilibrium are called the market-clearing prices, and the equilibrium 
itself is called a market equilibrium. 

Much work has been devoted to establishing the existence of market equilib- 
ria [1, 11]. This difficult problem is approached by placing different assumptions 
on the endowment and utility functions of the agents. The seminal work of Ar- 
row and Debreu [1] proves the existence of market equilibria in the quite general 
setting of concave utility functions by applying Kakutani’s fixed point theorem. 

S. Arora et al. (Eds.): APPROX 2003-1- RANDOM 2003, LNCS 2764, pp. 98-108, 2003. 

© Springer- Verlag Berlin Heidelberg 2003 
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This generality comes at a high price: the proof is non-constructive and so does 
not give an algorithm to compute the equilibrium prices. Yet computing these 
prices can be of considerable importance for predicting the market. For exam- 
ple, in order to determine the effects of a change in a tariff, we must be able to 
compute the equilibrium prices before and after the tariff change. Equilibrium 
prices also have applications in computer science. Kelly and Vazirani [8] show 
that the rate control for elastic traffic in a network can be reduced to a market 
equilibrium problem. 

Despite the impressive progress in computing equilibrium prices [2, 3, 4, 10], 
especially the seminal work of Scarf [10] , polynomial-time algorithms have evaded 
researchers. In the special case of linear utility functions, Deng, Papadimitriou, 
and Safra [5] (see also [9]) provide a polynomial-time algorithm when the num- 
ber of goods or agents is bounded. Devanur et al. [6] obtain a polynomial-time 
algorithm via a primal-dual-type approach when there is a demarcation between 
sellers and buyers. However, the question of existence of a polynomial-time al- 
gorithm for the general case is still open. In this paper, we present the first 
fully polynomial-time approximation scheme for this problem. Since the market 
equilibrium problem is not an optimization problem, we need to clarify what 
we mean by an approximate market equilibrium. For this, we use a definition 
proposed by Deng, Papadimitriou, and Safra [5]. According this definition, an 
approximate market equilibrium is a price vector for which there is an alloca- 
tion of goods to the agents that approximately clears the market and each agent 
is approximately maximally happy with the allocation (subject to her budget 
constraint). The precise definition is presented in Section 2. 

In the market equilibrium problem, all agents are buyers as well as sellers. 
The algorithm of Devanur et al. [6] works only when the buyers and sellers are 
different. The reason is that their algorithm requires that the buyers’ budgets 
be known beforehand. The main idea of our algorithm is to overcome this diffi- 
culty by running in iterations, and letting the budget of an agent in the current 
iteration be the revenue she generated in the previous iteration. The algorithm 
of Devanur et al. [6] requires an initial setting of prices in which no good is un- 
dersold. We satisfy this requirement by adding a dummy buyer who has enough 
money to buy the residual goods. 

The rest of this paper is organized as follows. In Section 2, we precisely define 
the model, our assumptions, and the notion of approximate market equilibria. 
In Section 3, we present our algorithms. In Section 4, we prove that one of our 
algorithms computes an approximate equilibrium in polynomial time. Finally, 
in Section 5 we conclude with a summary of our results and a discussion of 
remaining open questions. 

2 Definitions and Preliminaries 

Consider a market consisting of n agents trading m types of divisible goods. 
Initially, each agent i has an endowment ic* € K™ of goods (i.e., in] indicates 
the amount of good j that agent i initially has). We assume, without loss of 
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generality, that in total there is one unit of each type of good in the initial 
endowments (i.e., ^ every good j). Also, each agent i has a utility 

function ut : K™ K’*'. That is, if x G K™ is a vector that specifies how much 
of each good agent i has, then Ui{x) indicates the utility (or happiness) of agent 
i. If a price of p* dollars is set for one unit of good j, then agent i can sell her 
endowment for a total of dollars. Using this money, she can buy a 

bundle x G K™ of goods. Since each agent is trying to maximize her utility, the 
bundle x is a solution to the following maximization program. 

maximize Ui{x) , , 

subject to 'E]LiP*Xj < ^ ’ 

Such a solution x is called an optimal bundle for agent i. If the function Ui is 
strictly concave (i.e., for every x yf x' S K™, Mi((x + x')/2) > {ui{x) + Ui{x')) /2), 
then there is a unique optimal bundle for agent i. The Arrow-Debreu theorem 
states the following. 

Theorem A (Arrow and Debreu [1]). Consider the above setting and assume 
that Ui ’s are strictly concave. Then there is a price vector p* such that if each 
agent buys the optimal bundle with respect to p*, then the market clears. In other 
words, if x^ S K™ is the optimal bundle for agent i with respect to p*, then for 
every good j, x] = w]. 

If the utility functions are concave but not strictly concave (e.g., if they 
are linear), then the optimal bundle is not necessarily unique. In this case, the 
Arrow-Debreu theorem says that there is a price vector p* and a bundle x* for 
each agent i, such that x* is an optimal bundle for i with respect to p*, and if 
for every i, agent i buys the bundle x*, then the market clears. 

The proof of the Arrow-Debreu theorem is existential and uses a fixed point 
theorem. Therefore, a natural question is whether one can efficiently compute the 
equilibrium prices that are guaranteed to exist by the Arrow-Debreu theorem. 
This problem is still widely open. 

In [6], Devanur et al. present a polynomial-time algorithm that computes the 
market-clearing prices in a market with the following conditions: 

1. All utility functions are linear, i.e., Ui{x) = for non-negative 

constants Uij. 

2. There is a distinction between buyers and sellers in the market. More pre- 
cisely, there are m sellers, each having one unit of a different type of good, 
and n buyers in the market. Each buyer i has a given budget e^, and wants 
to buy a certain amount of each good to maximize her utility, subject to her 
budget constraint. We will call a market with this property a dichotomous 
market. 

In this paper, we refer to the algorithm of Devanur et al. [6] as the DPSV 
algorithm. The idea of the DPSV algorithm is to start from a price vector p° 
satisfying an invariant stated below, and keep increasing the prices subject to 
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not violating the invariant, until we converge to the equilibrium prices. In order 
to introduce the invariant, we first define the concept of the equality subgraph. 

Let p be a price vector. For each agent i, let = maxj{tty /pj} (at is agent 
i’s bang per buck). The equality subgraph iV(p) is a network whose vertex set 
consists of a source s, a vertex aj for each good j, a vertex bi for each buyer i, 
and a sink t. Let A and B denote the sets of a/s and bi’s, respectively. There is 
an edge from s to each aj S A of capacity pj (the price of j), and an edge from 
each bi £ B to t of capacity (the budget of i). Also, for each buyer i and good 
j, if at = Uijjpj, then we put an edge from aj to bi of infinite capacity. This 
edge is called an equality edge. Notice that by this definition a bundle is optimal 
for buyer i with respect to the prices p if and only if its total price is equal to 
the budget of i, and it only contains goods that have an equality edge to bi in 
iV(p). 

Now, we can state the invariant of the DPSV algorithm. 

Invariant 1 The prices p are such that (s, A U i? U t) is a min-cut in iV(p). 

For a price vector p and a subset S of goods, we define Bp{S) as the set of 
buyers i such that iV(p) contains an edge from aj to bi for some j G S. In other 
words, Bp{S) is the set of buyers who are interested in at least one of the goods 
in S at price p. For any S C A, the money of S (denoted by mp{S)) is the sum 
of the prices of the goods in S. Similarly, the money of a subset S of B (denoted 
by me{S)) is the sum of the budgets of the buyers in S. 

By the above definition, it is straightforward to see that Invariant 1 is equiv- 
alent to the following. 

Invariant 2 The prices p are such that for every S C A, we have mp{S) < 
me(Tp(S')). 

Since the DPSV algorithm starts with an arbitrary price vector satisfying the 
invariant, and only increases the prices until it reaches the equilibrium, therefore 
it proves the following stronger statement. We will use this observation in the 
analysis of our algorithm. 

Theorem B (Devanur et al. [6]). Let p° be a price vector satisfying Invari- 
ant 1. Then there is a market- clearing price vector p* such that p* > for 
every good j. Furthermore, p* can be computed in polynomial time. 

In this paper, we present an algorithm that computes an approximate mar- 
ket equilibrium in the setting of the Arrow-Debreu theorem (where there is no 
dichotomy between buyers and sellers) assuming that the utility functions are 
linear. This “approximately” answers an open question of [6]. 

Since the market equilibrium problem is not an optimization problem, we 
need to clarify what we mean by an approximate market equilibrium. Deng et 
al. [5] presented the following natural definition for the notion of approximate 
market equilibria. 
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Definition 1. An e-approximate equilibrium for a market is a price vector p* 
and a bundle x"‘ for each agent i such that 

— The market approximately clears, i.e., for every good j, (1 — e) — 

— For all i, the utility of agent i is at least (1 — e) times the value 

of the optimum solution of the maximization program (1). 



3 The Algorithm 

In this section, we present two algorithms for computing market-clearing prices in 
a market with m types of goods and n agents, each having an initial endowment 
w* of goods and a linear utility function Ui{x) = UijXj. The first algorithm 

is similar in nature to the DPSV algorithm, and is based on the simple approach 
of increasing the price of oversold items until we reach an equilibrium. Unable 
to analyze the running time of this algorithm, we present a modification of this 
algorithm which we will prove, using Theorem B, that computes an approximate 
equilibrium in polynomial time. 

Before we present the algorithm, we define the equality subgraph correspond- 
ing to the price vector p. The definition is similar to the definition of the equality 
subgraph in a dichotomous market presented in Section 2, except here the bud- 
get of each buyer is a function of prices. More precisely, the equality subgraph 
has m vertices in the first part A, n vertices in the second part B, equality edges 
between A and B as defined in Section 2, an edge of capacity pj from the source s 
to the vertex Uj € A, and an edge of capacity Pj'^Oj from the vertex bi G B 

to the sink t. We will denote this equality subgraph by iV'(p) to avoid confusion 
with the equality subgraph for dichotomous markets defined in Section 2. The 
money of a set (denoted by mp{S)) is defined as in Section 2, using 
as the budget of buyer i. 

For a set S' C A, we define the deficiency of S (denoted by defp(S)) as 
iTip{S) — mp{Fp{S)). The maximum deficiency of the price vector p (denoted by 
maxdef(p)) is the maximum value of defp(S) over all S C A. The following fact 
is easy to observe. 

Proposition 1. Assume p is a price vector and the budgets defined above are 
non-zero. Let {s} U S U T be the s-side of the minimum st-cut in N'{p). Then 
T = Fp{S), and the deficiency of the set S is equal to the maximum deficiency 

ofp- 

We call a set S with def(S) = maxdef(p) a maximally deficient set with 
respect to p. By the above fact, finding a maximally deficient set is equivalent 
to finding a minimum st-cut in N'{p). 

We are now ready to state our first algorithm. 
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Algorithm 1 

1. Start from an arbitrary price vector, say = (1, 1, . . . , 1). 

2. Find the largest maximally deficient set S. Let D = def(S'). If D = 0 then 
stop. 

3. Remove all equality edges between A \ S' and F^(S) from N'{p). 

4. Increase the prices of the goods in A \ S continuously and at the same rate 
(i.e., multiply these prices by a factor S initially equal to 1, and increase S 
continuously), until one of the following events occur: 

(a) A new equality edge is added to JV'(p). 

(b) For a set S' % S, the deficiency of S' becomes equal to D. 

In either case, continue from Step 2. If none of the above events happens for 
any value of (5 > 1, then proceed to the next step. 

5. Set the prices of the goods in S to zero, remove these goods from the set of 
goods, and start again from Step 2. 



Step 4 in the above algorithm can be implemented using binary search over 
values of 6 or using a parametric network flow algorithm [7] to find the first 
event that occurs. Notice that Step 5 in the above algorithm is only for taking 
care of (pathological) cases where in the equilibrium some of the prices are zero. 
If, for example, we assume that each agent has a non-zero utility for each good 
(i.e., Uij > 0 for every i,j), then we will not need this step. 

The intuition behind Algorithm 1 is simple: it is easy to observe that if the 
maximum deficiency of the initial price vector p° is , then the algorithm never 
lets the maximum deficiency of p to increase beyond On the other hand, the 
algorithm keeps increasing the total price of all goods. Therefore, the ratio of 
the maximum deficiency to the total prices will converge to zero. However, since 
in each step we might only slightly increase the prices, we were unable to prove 
any polynomial upper bound on the running time of Algorithm 1. Instead, we 
will change the algorithm to use the DPSV algorithm as a subroutine in each 
iteration. This enables us to prove a polynomial bound on the time it takes until 
the algorithm reaches an approximate equilibrium. 

Algorithm 2 

1. Start from an arbitrary price vector, say p := (1, 1, . . . , 1). 

2. Let D := maxdef(p). 

3. Construct an instance Mp of a dichotomous market as follows: There are m 

types of goods and n -I- 1 buyers. For i = 1, . . . , n, the utility of buyer i for 
the goods is the same as the utility of the corresponding agent in the origi- 
nal instance. Also, the budget of buyer i is et := The {n+ l)’th 
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buyer has a budget of e„+i := D, and its utility for good j is equal to pj 
(i.e., at price p, buyer n + 1 is equally interested in all goods). 

4. Run the DPSV algorithm on the instance Mp starting from the price vector 
p. Let p' denote the output of this algorithm. 

5. For every agent i, let e' := the budget of i with respect to p'. 

If e'/ci < 1 + e for every agent i, then output p' and stop. 

6. Let p := p'. Go to Step 2. 

We will show in the next section that after at most polynomially many iter- 
ations, Algorithm 2 finds an e-approximate market equilibrium. 

4 Analysis 

In this section we will prove that Algorithm 2 is correct (i.e., it computes an e- 
approximate market equilibrium) and terminates in polynomial time. We start 
with the following simple lemma, which shows that the price vector p satisfies 
Invariant 2 of the DPSV algorithm on the instance Mp, and therefore in Step 4 
of Algorithm 2 we can run the DPSV algorithm with the initial price vector p. 

Lemma 1. In Step 4 of Algorithm 2, the price vector p satisfies Invariant 2 of 
the DPSV algorithm on the instance Mp . 

Proof. It is enough to notice that by the definition, at the price p, the buyer 
n J- I is interested in all goods. Therefore, adding this buyer to the set of buyers 
decreases the deficiency of every set by the budget of buyer n+ 1, which is D. 
Therefore, after adding buyer n + 1, the maximum deficiency is non-positive. 
Thus, p satisfies Invariant 2 on the instance Mp. □ 

The following lemma shows that when Algorithm 2 stops in Step 5, it must 
have found an £-approximate market equilibrium. 

Lemma 2. Assume Algorithm 2 terminates and outputs the price vector p* := 
p'. Then there exist a bundle x' for each agent i such that 

— The market clears, i.e., for every good j, ~ Sfci 

— For all i, the utility of agent i is at least (1 — e) times the value 

of the optimum solution of the maximization program (1). 

Therefore, the price vector p* together with the allocation x is an e-approximate 
market equilibrium. 

Proof. Consider the instance Mp constructed in the last iteration of the algo- 
rithm, and the equality subgraph A^(p^) for this instance. Find a maximum flow 
from s to t in this network, and let yj denote the amount of flow from the Oj 
to bi divided by pb. Thus, the total amount of flow entering the vertex bi is 
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Therefore, since p' is a market-clearing price in Mp, we must have 
every i. By Theorem B we have p' > p and therefore e' > 
for every i. This shows that the allocation y* does not violate the budget con- 
straint of agents. Also, by the termination condition of the algorithm, we have 
Ci > e'/(l -b e) > (1 — e)e'. Thus, p' y* > (1 — e)e'. That is, every agent uses 
at least a (1 — e) fraction of her budget. Since utility functions are linear, we 
know that the solution of the maximization program 1 is precisely the budget 
of agent i times the bang per buck for agent i. By the definition of the equal- 
ity subgraph, the agent only buys goods that have the highest bang per buck 
for her. Therefore, the utility that agent i has for the allocation y® is at least a 
(1 — er) fraction of her optimal bundle. Thus, the allocation y® satisfies the second 
condition. 

In order to satisfy the first condition, we change the allocation y® as follows: 
by the principle of conservation of money the total extra money that the agents 
have after buying the bundles y® is equal to the total price of the unsold goods. 
We distribute these goods among the agents arbitrarily, so that all goods are 
sold (i.e., the market clears). Let x®’s denote the resulting allocations. Since by 
doing so we do not decrease the utility of any agent, therefore the allocation cc® 
satisfies both conditions of the lemma. □ 

Lemmas 1 and 2 together prove that Algorithm 2 is correct. Now, we only 
need to show that it terminates after polynomially many iterations. This is based 
on the fact that the price vector p in Algorithm 2 satisfies the following invariant. 

Lemma 3. Algorithm 2 never increases the maximum deficiency of the price 
vector p. 

Proof. We need to show that the maximum deficiency of the price vector p' 
computed in Step 4 is not more than D (the maximum deficiency of p). Since 
the output p' of the DPSV algorithm must satisfy Invariant 2, we have {S) < 
meiPpfS)) for every set S of goods in Mp, where PpfS) denotes the set of buyers 
that have an equality edge from the goods in S in the equality subgraph iV(p') for 
the instance Mp (we use P' instead of T to indicate the presence of the dummy 
buyer n + 1), and me{Ppi{S)) is computed using the budgets := J^'jLiPj'^p 
Therefore, if we remove the buyer n -I- 1 from this instance, we still have 

m,p'{S) - me{Pp>{S)\{n+l}) < D (2) 

for every set S. 

On the other hand, by Lemma 2 and Theorem B, the price vector p' must 
satisfy p' > pj for every good j. Therefore, we have 

me{rp,{S)\{n+l}) = 

ier,{S)\{n+l} 

m 
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m 






= mp>{rp>{S)). 



( 3 ) 



By Equations 2 and 3, we have 

defp'(S') = mp'{S) — mpi{rpi{S)) < mp>{S) — me{rp,{S) \{n + 1}) < D. 

This completes the proof of the lemma. □ 

We are now ready to analyze the running time of Algorithm 3. 

Lemma 4. Let e„iin := mini minimum budget Ci in the first itera- 

tion of the algorithm. Then Algorithm 2 terminates after at most 0( j log( ^J" )) 
iterations. 

Proof. By Theorem B we have > P and therefore e' > for every i. On the 
other hand, we have 

E ^ = E 

-i 3 3 2 

Therefore, for every i, 

e'i -6i<D. (4) 

Let denote the maximum deficiency of the original price vector 
By Lemma 3, the value of D in Algorithm 3 is always less than or equal to . 
Also, < m by definition. Therefore, by Equation (4), we have e' — < m. 

Thus, 

ei , m 
— < IH . 

By the above inequality, the event — > 1 + e can happen only if — > e or < 
m/e. However, if this event happens in some iteration, then the value of in the 
next iteration (which is the same as the value of e( in the current iteration) will 
grow by a factor of 1 + e. This means that after k = 0{^ log( ^J", )) occurrences 

of the event ^ > 1 + e, the value of Ci will be at least emin(l + sY > 

therefore by the above observation the event ^ > l + £ cannot happen anymore. 
On the other hand, in every iteration in which the algorithm does not stop, 
this event must happen for at least one i. Thus, after at most 0(jlog( ^J" )) 
iterations the algorithm stops. □ 

Lemmas 2 and 4 together with the observation that log(l/emin) is upper 
bounded by a polynomial in the size of input imply our main result. 

Theorem 1. For every e > 0, Algorithm 2 computes an e-approximate market 
equilibrium in time polynomial in 1/e and the size of the input. 
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Remark 1. Using Lemma 3 and the fact that in each iteration '^jPj = '^jPj + 
D, it is straightforward to show the ratio of the maximum deficiency to the total 
price of goods (maxdef(p)/ Pj) in the rth iteration of Algorithm 2 is at most 
\/r. Therefore, if instead of the requirements of Definition 1 we only need the 
relative maximum deficiency to be less than er, it is enough to run Algorithm 2 
for 1 je iterations. 

5 Conclusions 

In this paper we presented a polynomial-time approximation scheme for comput- 
ing an approximate market equilibrium for a general market with linear utilities. 
The main problem that remains open is to obtain a polynomial-time algorithm 
for computing the exact equilibrium. We introduced Algorithm 1 as a candidate 
for such an algorithm, but have been unable to analyze the running time of this 
algorithm. The problem of analyzing the running time of Algorithm 1 is simi- 
lar in nature to the question left open in [6] on the running time of their basic 
algorithm. It is conjectured by Goemans that the basic algorithm of [6] runs in 
strongly polynomial time. A solution to this conjecture might be the first step 
toward analyzing the running time of Algorithm 1. 

Another interesting open question is to generalize the result of this paper 
or [6] to the case of strictly concave utility functions. In the Arrow-Debreu 
setting, strictly concave utility functions are more interesting than linear util- 
ity functions, since if the utilities are strictly concave, all optimal bundles are 
uniquely determined from the prices. Even for special classes of strictly con- 
cave utility functions, we do not know how to compute market-clearing prices 
efficiently. 

Throughout this paper, we assumed that we know the initial endowment 
and the utility of the participating agents. It would be interesting to consider 
scenarios where the agents are allowed to behave strategically in announcing 
their initial endowment or utility function. 
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Abstract. We consider the problem of finding a minimum diameter 
spanning tree with maximum node degree B in a complete undirected 
edge-weighted graph. We prove that the problem is NP-complete, and 
provide an 0{y^\ogg n)-approximation algorithm for the problem. Our 
algorithm is purely combinatorial, and relies on a combination of filtering 
and divide and conquer. 



1 Introduction 

The importance of algorithms for designing efficient networks in today’s inter- 
connected world can hardly be overstated. The operative word here is “efficient”, 
and indeed, there are many (often conflicting) ways to measure the efficiency of 
a network. Suppose a telecommunication company is building a communication 
network. While budgeting constraints may require the company to minimize to- 
tal cost, there are also quality of service and technological constraints which may 
require the network to have low diameter and low degree. 

Low diameter is essential to ensure that any pair of nodes can communicate 
fast. It is also useful to force reliability constraints, as explained in the following 
(see also [8] and [13]): Assume that an edge e fails with probability 1— Pe, and that 
all failures occur independently. Then, the probability that a path ei, C 2 , . . . , Cfc 
is operational is Pei x Pb 2 x • • • x pe^ . Given a certain threshold value for the 
desired reliability, there is a corresponding parameter D such that the diameter 
of the network defined by edge length (| log Pe Dees is required to be at most D. 
Therefore, the reliability constraint is transformed into a diameter constraint. 

Degree-constraints appear naturally in graph-theoretic abstractions of com- 
munication network design problems. As an example, consider the so called IP 
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multicast [4,5] problem where we would like to disseminate centrally stored in- 
formation from a server node to a set of client hosts. The standard solution is 
to compute a tree in the given graph that spans the server node and all client 
nodes. We then send data packets from the root along each of its incident edges 
in the tree. An internal node forwards incoming information to its descendants in 
the tree. The degree of a node in this tree is proportional to the amount of work 
that the node has to do and it is hence natural to aspire to compute spanning 
trees of low maximum degree (see also [1,2,3]). 

Our work is motivated by precisely these considerations. We proceed by defin- 
ing our problem. 

1.1 Problem Definition 

Formally, we consider the following bounded degree minimum diameter 
SPANNING TREE PROBLEM (BDST): given an undirected complete graph G = 
(y, E) whose edges are endowed with a metric length function {le}e^E and a 
parameter B > 2, we want to find a spanning tree T of G of maximum node- 
degree at most B. At the same time we want to minimize the diameter of T, i.e. 
we would like to minimize 



A{T) := max dlstf (u^v) 

u,v^V 

where distf (u,v) denotes the /-length of the unique M,u-path in T. 

Let the height of a tree T rooted at node r be the maximum number of edges 
on any r, w-path, where w is a leaf node in T and denote it by height(T). We 
also use n and m to denote \V\ and \E\, respectively. 

For B = 2, BDST can be approximated within a constant using approxima- 
tion algorithms for the Traveling Salesperson problem. In this paper we consider 
the case B > 3. 

1.2 Results and Paper Outline 

Our main result is an 0( -^log^ n) approximation algorithm for BDST. The 
algorithm is described and analyzed in Section 2. There are two main ideas in 
the algorithm. First, we break up the graph into clusters of low diameter. For 
each cluster, we compute a balanced {B — l)-ary tree. We then compute a global 
tree over the clusters, and show that the resulting tree has low diameter. 

Our algorithm is the first known sub-logarithmic approximation for this prob- 
lem. An 0(log3 n) approximation is trivial; any complete balanced (B — l)-ary 
spanning tree of the graph will do. We also prove the NP-completeness of BDST 
in Section 3. We conclude the paper with some open questions. 

1.3 Related Work 

Ravi [12] considered the minimum poise spanning tree problem defined as 
follows: given an unweighted graph G = (V, if), we want to find a spanning tree 
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T = {V, Et) such that the sum of the maximum degree of a node in T and 
the diameter of T is minimized. In order to provide an approximation algorithm 
for this problem, he presented an (0(log n), 0(log^ n))-bicriteria approximation 
for the BDST problem with a length metric defined by the distances in an 
unweighted graph G, with the restriction that we can only use edges from G. 

The MINIMUM DIAMETER SPANNING TREE PROBLEM is the following: given 
an undirected graph G = (V, E) and length function defined over its edge set 
{le}eeE, we want to find a spanning tree of G of minimum diameter. This prob- 
lem is equivalent to finding the shortest paths tree from the absolute 1-center of 
G (see [9]), and therefore, is solvable in 0{mn + n^logn) time. 

The MINIMUM DEGREE SPANNING TREE PROBLEM is the following (see [6]): 
given an undirected graph G = (V,E), we want to find a spanning tree of G 
whose maximum node-degree is minimized. Fiirer and Raghavachari [6] provided 
a polynomial time algorithm which computes a spanning tree with maximum 
degree at most Z\* -|- 1 where A* denotes the smallest possible maximum degree 
of any spanning tree of the input graph. The algorithm in [6] extends also to 
Steiner trees. 

Konemann and Ravi [10,11] studied the minimum-COST degree bounded 
SPANNING TREE problem, where in addition to an undirected graph and non- 
negative edge-costs we are also given a parameter By > 1 for each node v G V. 
The objective is to find a minimum-cost spanning tree where the degree of each 
vertex S R is at most By. The authors show how to compute a tree T where 
each node v has degree 0{By + log(n)) and the cost of T is 0{ opt ) where opt 
is the minimum cost of any tree obeying all degree-bounds exactly. 



2 Algorithm and Analysis 

2.1 Overview 

The main idea behind our algorithm is filtering. Let a > 0 be a threshold, where 
distances more than a are called long and distances less than a are short. We 
partition the node set of G into clusters such that the diameter of each cluster is 
short, but the number of clusters is also low. We do this by filtering the node set 
so that we retain one representative node for each cluster, and define an artificial 
degree bound for this representative node to account for the degree capacity of 
the entire cluster. 

We obtain our performance guarantee from the following two observations. 
Since the number of clusters is small, any balanced tree which spans the repre- 
sentatives has a small number of long edges. And since each cluster has small 
diameter, the overhead added to any path by the expansion of the representative 
nodes into trees spanning the clusters is also small. The rest of this paper shows 
that such a threshold exists and yields our claimed performance guarantee. 
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Algorithm 1 GlobTree({(wi, 5^)} Compute a tree T on the nodes 
such that node Vi has node degree at most Bi for all i. 

1 : Assume B\ > . . . > Bi. 

2: T^0. 

3: di ^ Bi for all 1 < i < Z 
4; for i = 2 to Z do 

5: Let 1 < j < i be smallest with dj > 0. 

6: Add edge (vj,Vi) to T. 

7: dj <— dj — 1. 

8 : di ^ di — 1 . 

9: end for 

10: return Tree T with root vi. 



2.2 Algorithm 

Given an appropriately chosen threshold a, the first step of our algorithm is to 
find representatives R = {wi,...,u;} C V and a partition of V into pairwise 
disjoint sets: 



V = VxU ...UVi (1) 

such that Vi G Vi and dist;(z;i, it) < 3 • a for all 1 < i < Z and for all u G Vi. 
Roughly speaking, we then construct a low-degree and low-diameter tree on the 
nodes of R. This tree determines the global structure of our solution. In addition 
we construct low-diameter degree- B-bounded trees for the nodes of each set Vi, 
I < i < 1. We finish by replacing the nodes from R in the global solution by the 
respective spanning trees. 

In the following we assume that we have a guess for the optimum diameter 
A. This is justified since the diameter of an optimum tree is within the interval 
[maxeg E le,n- ■ maxegE le] and we can perform a binary search in order to find a 
proper approximate guess (i.e., a guess within twice the optimum diameter). 

We now detail the process of finding the partition from (1). We proceed in 
iterations: in iteration I < i < I, we compute the set Vi and its representative 
Vi- For ease of notation, we use to denote the set of nodes that are at a 
distance of at least 7 from the first i — 1 representatives {iii, . . . , Vi-i\. In order 
to define these sets formally, let cov.^(i!, U) = {u G U : disti(ii, it) < 7 } be the 
set of nodes in U that are within a distance of 7 of vertex v (we also say that v 
7 -covers the nodes in cov.y(i;, [/)). Then, we let = V for all 7 > 0. For i > 1 
we define = V\ Ui<j<*_i cov^{vj, V). 

Let a be a given contraction threshold. In iteration i, we then pick vertex Vi G 
17“ that a-covers most nodes in C/f“, i.e. we let Vi = argmax^gfja |coVa(i!, C/f“)|, 
and Vi = cov 3 a{vi,Uf°‘). 

The algorithm stops as soon as all nodes in V are within a distance of at most 
So from some representative. We assume that this happens after I iterations. We 
have = 0 and Uf°‘ yl: 0 for all 1 < i < L 
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Algorithm 2 BDST(G, A)'. Compute a degree B tree of diameter no more than 
n)A. 

1: a ^ A/y^loggn. 

2: i ^ 0. 

3: f/f“ ^ V. 

4: while f/f“ 7 ^ 0 do 

5: Di ^ argmax„gyc|coVc(n,f/f“)|. 

6 : Vi ^ coY3„{vi,U!‘^). 

7: i ^ i + 1. 

8: end while 

9: Reorder {ui, V2, ■ ■ ■ , Vi} so that | Vi| > IP 2 I > ■ • • > |P;|- 
10: Compute Bi as defined in (2). 

11 : T® ^GlobTree({(ni,Bi)}'=i)- 

12 : for i = 1 to Z do 

13: Ti <— Tree spanning Vi of degree at most B and minimum height. 

14: Replace Vi by Ti, and distribute the edges in T® incident on Vi over the nodes of 

Ti so that the maximum degree of any node in Ti is at most B. 

15: end for 

16: retnrn Resulting tree , 



In order to compute the final tree, we go through two main steps. Assume 
that we have reordered the sets {Vi} such that |Vi| > IV 2 I > ... > \Vi\. 

Global structure For each 1 < i < Z, let the degree bound of node Vi be 



We then compute a tree T® = GlobTree({(ui, the nodes 

[vi , . . . , vi} of low diameter. See Algorithm 1 for the details. 

Local structure For each 1 < z < / we construct a tree Ti spanning the nodes 
of Vi of minimum height such that the degree of each node is at most B. 

Finally, we compute the final tree T by taking the global tree T® which 
is rooted at v\ and replacing each node Vi by the tree Ti. We distribute the edges 
that are incident to Vi in T® over the nodes of Ti evenly, such that the maximum 
degree of any node of Ti is as small as possible. 

A listing outline of the algorithm is shown in Algorithm 2. Its output is 
always a tree of degree no more than B. We do a binary search over A to obtain 
a tree of minimum diameter. In the following, we analyze the performance of the 
algorithm, assuming the correct value for A is fed to Algorithm 2. 

2.3 Performance Ratio 

Theorem 1. Suppose that there is a tree T* with maximum node-degree B and 
diameter A. Then Algorithm BDST(G, Z\) produces a tree T ^P^ with maximum 
node-degree B and diameter 0(ydog^jn • A). 
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Theorem 1 is the main result we are trying to prove. We prove it at the end of 
this section, using a sequence of lemmas which follow. 

Lemma 1. The maximum degree is no more than B. 

Proof. For any i, the degree of vertex Vi in the global tree T® is bounded by 
(|14| — 2) • i? + 2 for all 1 < i < /. Also, tree Ti has \Vi\ nodes each with capacity 
B and there are exactly \Vi \ — 1 edges in Ti. Hence the total available capacity 
of the nodes in Vi for edges outside Ti is \Vi\ • (i? — 2) + 2. Hence, there is a way 
of distributing the edges of T® that are incident to node Vi over all nodes of Ti 
such that the maximum degree in T is at most B. 

We now prove that T’^.px diameter 0{^yiogg n - A). In the following, 
we say that an edge uv £ E is short if u,v £ Vi for some I < i < I, and uv is 
long otherwise. Our proof of the diameter bound has two parts: the first part 
shows that the maximum number of long edges on any root, leaf-path in 
is 0(i/log3 n). The second part shows that there are Ofloggn) short edges on 
any root, leaf-path in T^P^ . This suffices, because the length of any edge in our 
input graph is at most A and the length of a short edge in G is at most 6a 
(using triangle inequality). 

First, we prove that any root, leaf-path contains at most 0{y/logg n) long 
edges. We begin by creating a partition of V using T*’s structure. We root T* 
at vl (chosen arbitrarily), and let Vf be the set of nodes u £ V such that the 
unique (vl,u) path in T* has length at most a. We let S* = {rii}, and let the 
set of uncovered nodes he U = V \ Vf initially. 

We continue until there are no uncovered nodes remaining. In iteration i > 1, 
let V* £ U he an uncovered node of smallest height in T* (i.e. v*’s parent in T* is 
already covered) . We then say that a node u is covered by v* if tt is a descendant 
of V* and the length of the path from v* to u in T* is at most a. We let V* be 
the set of nodes in U that are covered by v*. We remove V* from U and repeat. 

Assume that the final partition has sets Vf , . . . ,Vf and representatives 
vl,. . . ,v*. Since the subtree T*[V*] of T* induced by the nodes of V* is con- 
nected, a counting argument shows that the nodes of V* have at most 

^._(\V*\-{B-2) + 2 -. t=l 

* \ ■ (H — 2) -I- 1 : otherwise. ' ' 

children from V \V* in T* . Order the sets such that |F]*I ^ ^ \^q\ ^^nd let 

T be the tree produced by GlobTree({?;*}?^^, {H*}?^^). 

Definition 1. {(Fi, is called a proper collection of V for a given node 

set V if the following conditions hold: 

1. Vi CV and Vi £ Vi for all 1 < i < p. 

2. ViCiV i = % for all 1 < i < j < p. 

5. |Fi| > ... > |Fp|. 

4-. disti(?Ji, u) < a for all 1 < i < p and for all u £ Vi. 
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The following lemma is useful in order to prove that the height of the global 
tree T® is at most that of T. 

Lemma 2. Let {{Vi,Vi)}\^i be a partition of V together with a corresponding 
set of representatives created by Steps 1-8 of Algorithm 2. Let {{fVi^V be a 
proper collection of G as defined in Definition 1. We then must have 

3 3 

(4) 

i=l i=l 



for all I < j < max{/,p}. 

Proof. We prove the lemma by induction over \V\ = n. For n = 1 the lemma is 
trivially satisfied since in this case Vi = Vi = V. For n > 1, assume that the 
lemma holds for all node sets with at most n — 1 nodes. 

Assume now, for the sake of contradiction, that the lemma does not hold. 
Let j be the minimum index such that \^i\ < 1^*1- Then, there must 

exist an index Jq < j such that 

U 4". 

i<i<j 



and hence V jg % cov 3 „(wi, V) for all 1 < i < j. Notice that this implies Vjg G Uf. 

Now consider the application of the induction hypothesis for the set of nodes 
V = V\ Vjg. Since V jgC] coVa{vi,V) = 0 Vi, the application of our algorithm 
with V' yields the exact same set of the first j — 1 representatives v\,V 2 t ■ ■ , Vj-i 
and the corresponding subsets {Vi \ Vj„}{zl of V . Note that [V \ {V jg} is 
a proper collection of V' . Therefore, by the induction hypothesis, we conclude 
that 

j — 1 j 

El4"A^.ol>EF.I-F.ol- (5) 

2=1 2=1 

Let us now lower-bound the difference X)i=i \^i\~ Sti 1 4^ \ ^fol- 

This difference can be expressed as the sum of two terms: the size of the set 
Vj and the increase of the sizes of the biggest j—1 sets of our partition. Hence, 
we obtain 



E 14^*1 >El^A 4^.0 1 + 



2=1 2 = 1 



4^.0 n U COV3„(Wi,y) 

l<i<j 



+ 14^.1- 



( 6 ) 



Observe that in the j-th iteration of our algorithm we could have chosen Vjg as 
a representative instead of Vj since vjg € Uf. Therefore, we must have that 

\Vj\ = \cOVa(v > |cOV„(Tjo,t/|“)|. 
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Using (5) together with (6) and noting that 



U cov3„(wi,U) 

l<i<j 



+ |cov„(U,„,C/f)|>|U,J 



finally yields 






2=1 2 = 1 

This contradicts our assumption, and the lemma follows. 



( 7 ) 



Corollary 1. Let {Vi}\^i be the partition ofV generated by Step 9 of Algorithm 
2, and be the partition of V generated from the optimum tree. For all 

1 < j < max{/,p}, we have: 

j j 

E 1^*1 >E 1^.1 ( 8 ) 

2 = 1 2=1 

Proof. The statement in (8) clearly holds for the partition generated by 

steps 1-8 of Algorithm 2, noting that {(r!*, is a proper collection of V 

as defined in Definition 1. 

The corollary follows by observing that reordering the sets of the partition 
by non-increasing size increases the left hand side of (8) and does not change 
the right hand side. 

We can now prove that the height of the global tree T® is at most the height 
of the tree T. 

Lemma 3. When T is eonstructed from T* by GlobTree({?;*}?_^, 
we have height(T®) < height(T). 

Proof. We say that the level of node r; of T is the number of edges in the unique 
path from the root of T to w. We now claim that the level of node Vi in T® is at 
most the level of node v* in T for all 1 < i < /. We use induction over i to prove 
the claim. 

The claim is clear for i = 1. For i > 1, assume that GlobTree connects node 
V* to node v* for some p < i. It follows from (2), (3) and Corollary 1 that 

p p 

1=1 1=1 

and hence there must exist a 1 < p' < p such that dp> > 0 in GlobTree at the 
time when node Vi is connected. By the induction hypothesis, we know that the 
level of Vpi in T® is at most that of node vp in T. It follows from the definition 
of GlobTree that the level of vp is at most the level of vp Hence, the level of Vi 
in T® is at most the level of v* in T and this finishes the induction. 

Observe that the height of T® is equivalent to the level of node vi in T®, and 
that the height of T equals the level of v* in T. This implies the lemma. 
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Lemma 4. Let T® be a tree returned by GlobTree({(i;i, Then, T® must 

be a tree of minimum height among all trees that satisfy the given degree con- 
straints. 

Proof. Given a tree T, we define the following total order of the nodes in T. 
The order is a breadth- first-search order, with the refinement that the nodes of 
each level are ordered in non-increasing order of their corresponding sets Vi. In 
particular, the nodes of T are ordered vi,V 2 , ■ . ■ ,vi such that i < j if level(iii) < 
level(rij) or level(t!i) = level(rij) and \Vi\ > \Vj\. By construction of T®, we 
have that ifi < j in the total order of the nodes in T®, then \ Vi\ > \Vj\, regardless 
of their levels. Moreover, every tree of minimum height for which this holds must 
have the same height as height(T®). 

Assume for the sake of contradiction that there is a tree T' such that 
degrp,{vi) < Bi for all 1 < i < / and height(T') < height(T®). Let v[, . . . ,v'i be 
the total order induced by T', as defined above. By the observation in the pre- 
ceding paragraph, for some i < j, we have \V(\ < \VJ\. We call this an inversion, 
and without loss of generality, assume that T' is a tree with the fewest number 
of inversions among all trees that satisfy the degree constraints and have height 
less than height(T®). 

We show that we can reduce the number of inversions in T' without increasing 
the tree’s height. This contradicts the inversion-minimality of T' . 

Let {vi,Vj} be an inversion in T' . We swap labels: relabel node w- as w' and 
relabel w' as v[. The resulting tree may now violate the degree constraints at 
node v[. We counter this by moving a sufficient number of v'fs children to w'. 
This does not increase height(T'), and reduces the number of inversions in T' , 
which is a contradiction. 



Lemma 5. Any root, leaf-path in has at most yTog^jn long edges. 

Proof. Let d* denote the maximum number of long edges on any root, leaf-path 
in T* . It follows from Lemma 4 that height(T) < d* and hence, together with 
Lemma 3, we have that height(T®) < d* . 

By the construction of the partition Vf , . . . ,V* , we know that a root, leaf- 
path P in T* that contains d* long edges must have length at least a - d* . Since 
T* has diameter at most A it then follows that d* < Aj a = ydog^n by our 
choice of a. 

Lemma 5 bounds from above the contribution of long edges to the diameter 
of T ^P^ . We bound the contribution of short edges in the next lemma. For a 
root, leaf-path P in T^P^ ^ let \P\s denote the number of short edges in P. 

Lemma 6. Let P be an arbitrary root,leaf-path in T^P^ . Then, 

|P|s = 0(logs n). 

Proof. Let Pi and P 2 be two root, leaf-paths in and let Pf and be 

their images in T®, i.e. Pf = . . . , vj^) and P| = {vf,..., vf^). 
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We define a relation ^ on two root, leaf-paths as follows. We say that P\ -< P2 
if \Vj\ > \V^\ for all 1 < j < max{/i, Z2}, with \Vj \ = 0 if does not exist. By 
construction of T®, for every two paths P± and P2 at least one of the following 
holds: Pi A P2 or P2 P P\. Moreover, if Pi P2 then I2 < h + 

Recall that Ti denotes the local tree that spans the nodes of Vi. For the 
purpose of this proof, we assume that all edges of the form (vi,Vj) in T® such 
that Vi is a parent of Vj are attached to leaf nodes in Ti. This assumption only 
increases the number of short edges in root, leaf-paths, and hence is valid. 

Consider two paths Pi and P2 such that Pi A P2 . Since each Ti is a balanced 
{B — l)-ary tree, we have |P2|s < |Pi|s -I- |Pi| < |Pi|s -I- log^ n. We also have 
\Vi\ < Wi-i\ for i > 1, by construction of T®. Therefore, |Pi|s < IP2U -I- 
|Pi| -I- height(T^i) < |P2|s -|- 2 \ogg n. Hence, there exists a 7 such that \P\s S 
[7, 7 -I- 2 log^ n] for all root, leaf-paths P in T . 

Observe that on any root, leaf-path P in all but at most 0 {loggn) 

of the short edges must be incident to nodes of degree B. This follows from the 
fact that T® has 0(log^ n) levels. Since there are n nodes in our graph, we must 
have that 7 = 0(log3 n). This finishes the proof of the lemma. 

We are now ready to prove Theorem 1. 

Proof, (of Theorem 1 ) Lemma 1 shows that T^P^ has maximum degree B. 

Long edges in T^P^ have length no more than A, since the graph has a 
spanning tree of diameter A and we are assuming we have the correct guess of 
A. Hence, it follows from Lemma 5 that the contribution of long edges to the 
diameter of T^P^ is no more than 2Z\ydog^n. 

Short edges in T®-P^ have length no more than 6a = QA/ yjlogg n. Lemma 
6 bounds the number of short edges in any root, leaf-path, so the total contri- 
bution of short edges to the diameter of T^P^ is no more than 0 {alogg n) = 
0 {A^logB n). 

All edges are either long or short; this completes the proof of the theorem. 



3 Hardness 



In this section we prove that for any B >3 the BDST problem is NP-hard. We 
prove the NP-completeness of BDST by reducing set cover to it. 

Suppose we are given an instance of SET cover S, specified by subsets 
{^1 , . . . , Sm} of a universe U = {ui, . . . , u„}, and a number C. We want to find 
out if there is a sub-collection of at most C subsets that covers U. We fix a 
parameter B < C, and convert S into a graph G{S) as follows. 

The graph has four kinds of nodes. It has one node for every element of U. 



For each set Sj, the graph has 



JM 

B-l 



nodes. Before we describe the other sets 



of nodes, we describe the edges between these two sets. Every element-set pair 
(ui,Sj) such that Ui € Sj is represented by an edge of length 1 between Ui and 
one of the nodes representing Sj , such that every node that represents a subset 
has at most {B — 1) such adjacent edges. 
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G{S) has a set of artificial nodes, as follows. There is a special artificial node 
called the root, denoted r. For each set S'j, we build a tree of degree at most B 
such that the nodes representing Sj are at the leaves of this tree. The root of 
the tree is connected to r by a single edge of length 1 . All other edges of these 
trees have length zero. 

There are more artificial nodes to cover the artificial nodes described above. 
We build yet another degree B tree, whose leaves include all the nodes in the 
tree built for each set. The root of this tree is also connected to r with an edge 
of length 1. In this tree, all edges incident to leaves have length 1, and all other 
edges have length 0. 



We replace the root by a minimum height {B — l)-ary tree with 



C+2 

B-l 



leaves 



and where all the inner nodes have exactly {B — 1) children. Let r be the root 
of this tree. The edges of the tree has zero length, and for every neighbor of the 
old root it is now a neighbor of all the leaves of this tree with edges of length 1 . 
We denote the node set of this tree by ROOT. 



We add another {B — 1) 



C+2 

B-l 



— C extra nodes that are connected to the 



nodes of ROOT with edges of length 2. Note that there are at least 2 extra nodes. 
The BDST instance is defined by the metric closure of the above distances. 



Lemma 7. S has a set cover of size C if and only if G(S) has a degree B 
bounded spanning tree with diameter no more than f. 

Proof. Given a set cover of size B, we can embed it into G{S) in the obvious way, 
with the BDST being rooted at r. Since every element is covered by the set cover, 
there is a path of length 2 from ROOT to every element of U. The edges that 
connect the extra nodes provide a path of length 2 to these nodes as well. The 
artificial tree constructed above provides a path of length 2 to all nodes which 
do not participate in the set cover. The degree constraints are automatically 
satisfied by construction. Hence, if S has a set cover of size C, then G{S) has a 
BDST of diameter 4. 

Conversely, suppose G{S) has a BDST of degree B and diameter no more 
than 4. By construction of G{S), it is impossible for a tree of diameter 3 or less 
to span it; hence we may assume that the diameter of the tree is exactly 4. In 
this case, the tree must have a node such that all other nodes in the graph are at 
distance no more than 2 from it. We call this the center of the tree. Since there 
are at least two extra nodes and the only nodes that are within a distance of at 
most 2 from the extra nodes are ROOT, then the center must be at ROOT. 

If a BDST centered at ROOT spans the entire graph with paths of length 
no more than 2, then it must reach all the elements via sets which include those 
elements, and also it must reach the “extra nodes” directly. The construction of 
G{S) ensures that these paths induce a valid set cover, and the degree constraint 
ensures that this set cover has size no more than C. Hence if G{S) has a BDST 
of degree B and diameter 4, then it has a set cover of size no more than C. 

Since SET COVER is NP-hard [7], so is BDST. Clearly BDST is in NP, since 
it is easy to check whether a tree has diameter A and degree no more than B. 
We conclude as follows: 
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Corollary 2. BDST is NP-complete. 



Corollary 3. If P ^ NP, then there is no approximation algorithm with per- 
formance guarantee of less than | . 

The reduction is not approximation preserving, so even though set cover 
cannot be approximated to within a logarithmic factor, such a result is not 
implied for BDST. 

4 Open Questions 

In some situations, rather than bounding the diameter of the tree, it is required 
to bound the dilation of every pair of nodes. The dilation of a pair of nodes 
is defined as the ratio between their distance in the tree and their distance in 
the original metric. An approximation algorithm for degree bounded minimum 
dilation spanning trees is still open. This problem is closely related to the well- 
studied problem of approximating a general metric space by a tree metric. 

Our algorithm crucially uses the fact that the input graph is a complete 
metric. In particular, our algorithm does not work if we are given an (incomplete) 
input graph and a metric induced by the edge-lengths of its edges (and we are 
enforced to use only edges from the input graph). Thus, an improvement over 
the bicriteria (0(log n), 0(log^ n)) approximation algorithm from [12] for this 
case is still open. 
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Abstract. We show that it is NP-hard to 2" -approximate the integral 



of a positive, smooth, polynomial-time computable n-variate function, 
for any fixed integer k. 

1 Introduction 

Suppose F{-) is a real positive function defined on a cube C in Euclidean n- 
dimensional space R". We consider the problem of approximating the integral 
I{F) of F over C, with relative error e, under the additional assumption that F 
satisfies a smoothness condition. 

The exact integration of multivariate functions is hard, under the widely 
conjectured hardness of #P, given the result in [3], which implies that the exact 
calculation of the volume of an n-dimensional polytope is #P-complete. In view 
of this, we would like to address the question whether there is an algorithm that 
returns a value V such that 1/(1 + e) < I{F)/V < (1 -I- e), in other words an 
algorithm that e-approximates I{F). 

The first somewhat surprising answer to this question came with the ma- 
jor result of Dyer, Frieze and Kannan ([5]), who showed that there is a fully 
polynomial randomized approximation scheme (FPRAS) for the volume of an 
n-dimensional convex body. More precisely, they showed that the volume of an 
n-dimensional convex body K, given by a weak membership oracle Ai, can be 
e-approximated with failure probability with poly{n, e~^, log calls to Ai. 
Here, Ad can be thought of as a black-box algorithm that decides whether a 
given point is in 1C. This directly implies that there is a FPRAS for the integra- 
tion of n-variate concave functions that can be evaluated in time poly(n) at any 
point in the cube C. 

Subsequently, Applegate and Kannan ([2]), extended this result to positive, 
smooth and nearly log-concave functions. Define 



f{X)=lnF{X) 



and let c be the edge length of C, t{n) be an upper bound on the time needed 
to evaluate F at any point in C, and a, (3 satisfy 




( 1 ) 

( 2 ) 



/(AA + (1 - A)K) > A/(A) + (1 - A)/(F) - /3 
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for all x,y £ C and A € [0, 1]. Their algorithm has running time 




It can be seen that a measures the smoothness of F. This gives rise to the 
following definition of smoothness. 

Definition 1. A function F{-) is called k-smooth if it satisfies a. < nf . We 
denote by Sk the set of k-smooth functions, and by S = Sk the set of smooth 
functions . 



If /3 = 0, the function is log-concave (i.e. its logarithm is concave), so /3 can be 
viewed as a measure of the distance of F from log-concavity. The natural question 
is whether the dependence on (3 can be removed or somewhat alleviated. The 
contribution of this paper is to show that for any fixed integer k, it is NP-hard to 

k 

2" -approximate the integral of positive smooth functions that are computable 
in polynomial time. In fact, we show that considerably small improvements on 
the dependence on /3 would imply unexpected (and rather indirect) algorithmic 
improvements for well studied NP-complete problems. Formally, we show the 
following. 

k 

Theorem 1. For any fixed integer k > 3, if there is a (randomized) 2" - ap- 
proximation algorithm with time complexity 0{poly(a)2^^^^) for the problem of 
integration of functions from Sk+s, then there is a 0{poly{a)n^^^'^^^ ) (ran- 

domized) algorithm for the Hamilton Path problem on graphs with n vertices. 



k 

Corollary 1. For any fixed integer k, it is NP-hard to 2" -approximate the 
integral of polynomial-time computable functions from S. 

We note here that, in general, only a few negative results concerning the ap- 
proximability of counting problems are known. As observed in [6], the hardness 
of counting problems in most cases follows either from the NP-completeness 
of the corresponding decision problem, or from applying some “boosting” re- 
duction which exploits an embedded NP-complete problem (see [10,6]). There 
appears to be a paucity of results that prove the hardness of approximate count- 
ing problems for some other more “interesting” reason. One such case is [4], 
which proves that there is no FPRAS for counting the number of independent 
sets in graphs of maximum degree A > 25, unless NP=RP. As noted in [7], 
in view of the lack of “satisfactory” results that prove inapproximability under 
reasonable complexity-theoretical assumptions, research efforts have often been 
directed towards proving that certain restricted algorithmic approaches fail (see 
section 4 of [7] and the references therein). 

The rest of the paper is organized as follows. In section 2 we give an overview 
of the proof technique, in section 3 we give the details of the proof and finally 
in section 4 we make some concluding comments. 
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2 Overview 

We derive the result through a reduction from Hamilton Path (HP for short). 
Recall that HP is one of the first problems shown to be NP-complete (see [9]). 
Given a graph G (in some usual representation), HP asks whether there exists a 
simple path of length n, i.e. a path that goes through every vertex of G exactly 
once. 

With every graph G, we associate a function Fq- If G has n vertices, Fq is 
a function of n? variables. The function Fq has the the following useful char- 
acteristics. It can be computed at any point cc in a cube G of interest, in time 
polynomial in n. The parameters a, (3 of Fq (defined in inequalities 1,2 ), are 
polynomial in n. Also, the value of the integral of Fq depends on whether G 
contains a Hamilton Path or not. Specifically, if there is a HP, the integral of Fq 
over a cube G of constant edge size c, is lower bounded by an explicitly known 

k 

quantity Ih- If not, it is upper bounded by Ijvif, with Ih/Inh > 2" , for any 

k 

fixed constant k. It follows that the integral is not 2" -approximable. 

Also, since (3 = 0{n‘^) for some constant d (the smallest value of d we are 
able to exhibit in this paper is 6), an improvement of the running time of the 

integration algorithm to poly{n,e~^,a,2^^ ^ ), for any F > 0, would give a 

2o(n) randomized algorithm for Hamilton Path (the best currently known upper 
bound is 0(2"), see [1]), and through the Sparsificiation Lemma of [8] a 2°^") 
randomized algorithm for 3-SAT, where now n is the number of variables. 



3 The Proof 

3.1 Definition and Properties of the Function 

Let G be a graph with n vertices and V be the set of length-n paths of G. The 
function Fg{X) is a function of variables, X = {xn, . . . ,Xnn}- Each path 
p € P is associated with a term fp{X), and Fg{X) = 

We now describe the term fp{X) for a path p. Assume an arbitrary numbering 
of the graph vertices with numbers in [n]. We consider p as an ordered set of 
vertices wi, . . . , where Vi € [n]. We let m = n^, where k is an integer constant 
to be discussed later. We define 



fp{X) = l[g,{X) 



with 



g^{x) 




m 

Vij 



We will integrate Fq over the cube G = [l,c]"^, so we study its properties 
in this cube. Each term fp{X) is increasing in the variables appearing in the 
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numerator and decreasing in the variables appearing in the denominator. By 
setting the former to c and the latter to 1 , we get that the maximum value 
of fp{X) is 0(c" ™). Since there are at most n! paths, it follows that for any 
X £ C, Fc{X) can be expressed with 0{mn^logn) bits. 

As noted in [ 2 ], the smoothness parameter a, can be upper bounded by 



a < n‘^ max 

XGC,XiGX 



dx. 



InF(X) 



= n max 

XeC,XiGX 






Spg-p fp(^) 



( 3 ) 



Let Xi £ X he any variable. Since the exponent of Xi is at most nm, for all 
points X in C, we have 



dfp{X) 

dxi 



< nmfp{X) 



which combined with inequality 3, gives a < n^m. 

A note about the algorithm of [2] is due here. The algorithm operates on a 
grid imposed on C. The coordinates of the grid are multiples of 7 < Xj^a. From 
the bound on a it follows that we are interested in evaluating Fc at points which 
are rationale expressible in polynomial space. From the definition of Fq, its value 
at any point of the grid is also a rational expressible in polynomial space. 

The definition of (3 trivially implies that any upper bound for f{X) is also an 
upper bound for j3. From the above analysis we get (3 < 0(mn^ log n) . For a lower 
bound on (3 note that f{X) can be written as f{X) = In P{X) — mJ2i In Xij, 
where P{X) is a multivariate polynomial. Since P{X) is not log-concave in 
general, the value of (3 can be lower bounded from the value of j3 for the function 
f{X) = — jG[n] Inxij, which can be seen to be 0{mn^). Thus, we get /? > 
mn^. 

We finally note that Fc{X) has some additional interesting properties. First, 
Fq has derivatives of any order, everywhere in the cube C. Also, its form is 
relatively simple, as it is a sum of rational multivariate polynomials. In addition, 
given a graph G we can easily obtain a closed form for the integral of Fq, though 
of exponential length. 



3.2 A Polynomial Time Algorithm for the Evaluation of Fg 

We give an algorithm that computes Fq{X) at any point X, in n time steps. We 
extend the definition of the path terms, to paths of length t. Concretely, we let 

t 

fp{X) = l[g,{X) 

i=l 



with 



g^{x) 




771 

Vij 
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Let Vt{v) be the set of paths of length t that end in node v. Also, let Qi{v) = 
Inductively, assume that just before time step t, for every v £ V we have 
computed 



Let N{v) denote the set of neighbors of node v. At step t, for each node v we 
compute 



After n steps the quantities Qn{v) have been computed for all vertices v £ V. 
Then, 



The computation of Qt{V) requires a polynomial number of operations. Since 
there are n steps and n vertices, it follows that Fq can be computed with a 
polynomial number of operations. The points we are interested in are ratio- 
nale expressible in polynomial space, and from the observations of the previous 
subsection, all the intermediate quantities are expressible in polynomial space. 
It follows that Fq{X) can be evaluated exactly, at any point X £ C, m time 
polynomial in n. 

3.3 Bounding the Integrals 

We integrate F{X) over a cube C = [1, c]”^. Let dX = dxn • . . . • dxnn and tt be 
a permutation of the variable names. Since 



we can consider the integral of each path separately. We will refer to the value of 
the integral of a term corresponding to a path p as the integral of p. Also, since 



we can rename the variables in any term of F. It is then easy to see that the 
integral of a path depends only on the structure of the path and not on the 
particular vertices appearing on it. 

We first consider the integral of a HP. Since HP is a simple path, there are 
no cancellations of variables and its integral is 






peVt-i(v) 




F{X) = Y Qr.{v) 



vev 





1 )" 



Ihp 
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Let us now consider the integrals of other non-simple paths. Suppose a path p 
goes through n — d distinct nodes. Then, the corresponding term fp is of the 
form 



fp{X) 




' n(n+l) l2—t 

^o.im 

i=n-d-\-l 



X, 



where t and are integers that depend on the structure of p. In this case, d 
monomials in the denominator cancel with variables in the numerator, so that 



n(n— 1) j2—t 

ai = n{n — 1) /2 — d 



By integrating, we get 

n{n— 1) j2—t 

fp{X) < (1 - - l)"('^+l)/2+t ^a,m+l 

i=n— d+1 

Now sujopose we are given a non-Hamiltonian graph. Since there are at most 
n! < c" paths in the graph, the integral of the associated function is 

Inh < 

On the other hand, if the given graph is Hamiltonian (and even if we consider 
only the integral of the HP), the integral of the associated function is 

> ^-0{rd\o^n)^mn(n-l)/2 

which gives a large gap, namely 

Ih *> m—0{n^ logn) 

Inh ~ 

Recall that m = . By taking any fixed fc > 3 we get Theorem 1. 

4 Conclusions 

k 

We showed that it is NP-hard to 2" -approximate the integral of smooth positive 
n-variate functions, for any fixed integer k. We also argued that the currently 
best known integration algorithm cannot be substantially improved, unless there 
exist faster algorithms for Hamilton Path and 3-SAT. 

k 

Note that the 2" -inapproximability holds for (fc-l- 3)-smooth functions, with 
fc > 3. Also, in order to obtain the full range of our inapproximability result, we 
make use of functions that progressively become less efficiently computable. It 



L 
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is an interesting question whether similar inapproximability properties can be 
shown for classes of functions with different trade-offs between their evaluation 
time complexity and the value of their a, (3 parameters. 

We feel that the most interesting open question is whether a lower bound 
can be proved on /?, for any smooth polynomially computable function Fq which 
can be constructed using the techniques of this paper. 
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Abstract. This paper is divided into two parts. In the first part of this 
paper, we present a 2-approximation algorithm for the soft-capacitated 
facility location problem. This achieves the integrality gap of the natural 
LP relaxation of the problem. The algorithm is based on an improved 
analysis of an algorithm for the linear facility location problem, and a bi- 
factor approximate-reduction from this problem to the soft-capacitated 
facility location problem. We will dehne and use the concept of bifac- 
tor approximate reductions to improve the approximation factor of sev- 
eral other variants of the facility location problem. In the second part 
of the paper, we present an alternative analysis of the authors’ 1.52- 
approximation algorithm for the uncapacitated facility location prob- 
lem, using a single factor-revealing LP. This answers an open question 
of [16]. Furthermore, this analysis, combined with a recent result of Tho- 
rup [21] shows that our algorithm can be implemented in quasi-linear 
time, achieving the best known approximation factor in the best possi- 
ble running time. 



1 Introduction 

Variants of the facility location problem (FLP) have been studied extensively in 
the operations research and management science literatures and have received 
considerable attention in the area of approximation algorithms [17]. In the metric 
uncapacitated facility location problem (UFLP), which is the most basic facility 
location problem, we are given a set ^ of facilities, a set ^ of cities (a.k.a. 
clients), a cost fi for opening facility i S and a connection cost c^- for con- 
necting client j to facility i. The objective is to open a subset of the facilities 
in , and connect each city to an open facility so that the total cost is mini- 
mized. We assume that the connection costs are metric, meaning that they are 
symmetric and satisfy the triangle inequality. 

Since the first constant factor approximation algorithm due to Shmoys, Tar- 
dos and Aardal [18], a large number of approximation algorithm have been pro- 
posed for the UFLP [19, 11, 12, 15, 20, 13, 1, 3, 4, 5, 8, 13, 14], and the current 

* Research supported in part by NSF grant DMI-0231600. 
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best known approximation factor is 1.52 given by Mahdian, Ye and Zhang [16]. 
Guha and Khuller [8] proved that it is impossible to get an approximation 
guarantee of 1.463 for the UFLP, unless NP C DTIME[n‘^*-^°®*°®"^]. 

The growing interests in the UFLP rely on not only its applications in a large 
number of settings [7], but also the fact that the UFLP is one of the most basic 
models among discrete location problems. The insights gained in dealing with 
the UFLP may also apply to more complicated location models, and in many 
cases the latter can be reduced directly to the UFLP. 

In this paper, we give a 2-approximation algorithm for the soft-capacitated 
facility location problem (SCFLP) by reducing it to the UFLP. The SCFLP is 
similar to the UFLP, except that there is a capacity Ui associated with each 
facility i, which means that if we want this facility to serve x cities, we have to 
open it \x/ui\ times at a cost of fi\x/ui\ . This problem is also known as facil- 
ity location problem with integer decision variables in the operations research 
literature (See [2]). Chudak and Shmoys [6] gave a 3-approximation algorithm 
for the SCFLP with uniform capacities (i.e., Ui = u for all i € using LP 
rounding. For non-uniform capacities, Jain and Vazirani [13] showed how to re- 
duce this problem to the UFLP, and by solving the UFLP through a primal-dual 
algorithm, they obtained a 4-approximation. A local search algorithm proposed 
by Arya et al [1] had an approximation ratio 3.72. Following the approach of 
Jain and Vazirani [13], Jain, Mahdian, and Saberi [12] showed that the SCFLP 
can be solved within a factor of 3. This result was further improved by the au- 
thors [16] to a 2.89-approximation for the SCFLP. This is the best previously 
known algorithm for this problem. We improve this factor to 2, achieving the 
integrality gap of the natural LP relaxation of the problem. The main idea of our 
algorithm is to consider algorithms and reductions that have separate (not nec- 
essarily equal) approximation factors for facility and connection costs. We will 
define the concept of bifactor approximate reduction in this paper, and show 
how it can be used to get an approximation factor of 2 for the SCFLP. We will 
also generalize our algorithm to a common generalization of the SCFLP and 
the concave-cost FLP. The idea of using bifactor approximation algorithms and 
reductions can be used to improve the approximation factor of several other 
problems in a straightforward manner. 

In the second part of this paper, we present an alternative analysis for the 
1.52-approximation algorithm for the UFLP [16] using a single factor-revealing 
LP. This answers an open question of [16]. Furthermore, this analysis shows that 
the second phase of the 1.52 algorithm can be implemented in quasi-linear time. 
This, together with a recent result of Thorup [21], prove that our algorithm can 
be implemented in quasi-linear time, achieving the best known approximation 
factor in essentially the best possible running time. 

The rest of this paper is organized as follows: In Section 2 we present the 
necessary definitions and notations. In Section 3 we present a lemma on the ap- 
proximability of the linear-cost facility location problem. In Section 4 we define 
the concept of bifactor approximate reductions between facility location prob- 
lems, and present an algorithm for the SCFLP and a common generalization of 
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the SCFLP and the concave-cost FLP using the lemma proved in Section 3 and 
a bifactor reduction from the SCFLP to the linear-cost FLP. Then, in Section 
5, we present a new analysis on the 1.52 algorithm for the UFLP and show how 
it leads to an implementation in quasi-linear times. 



2 Definitions and Notations 

In this paper, we will define reductions between various facility location prob- 
lems. Many such problems can be considered as special cases of the generalized 
facility location problem, as defined below. This problem was first defined and 
studied in [9]. 

Definition 1. In the metric generalized facility location problem, we are given 
a set of Uc cities, a set of Uf facilities, a connection cost Cij between 
city j and facility i for every i € •^,j S and a facility cost function fi : 
{0, . . . , Uc} 1 -^ K’*’ for every i € Connection costs are symmetric and obey 
the triangle inequality. The value of fi{k) equals the cost of opening facility i, 
if it is used to serve k cities. A solution to the problem is a function if : ^ 

^ assigning each city to a facility. The facility cost of the solution (f is 
defined as ■ 4>{j) = OD) te., the total cost of opening facilities. The 

connection cost (a.k.a. service cost) of (j) is total cost 

of connecting each city to its assigned facility. The objective is to find a solution 
<f) that minimizes the sum -|- . 

Now we can define uncapacitated and soft-capacitated facility location prob- 
lems as special cases of the generalized FLP: 

Definition 2. The metric uncapacitated facility location problem (UFLP) is a 
special case of the generalized FLP in which all facility cost functions are of the 
following form: for each i S , fi{k) = 0 if k = 0, and fi{k) = fi if k > 0, 
where fi is a constant (which is called the facility cost of i). 



Definition 3. The metric soft-capacitated facility location problem (SCFLP) 
is a special case of the generalized FLP in which all facility cost functions are of 
the form fi{k) = fi\klui\, where fi and Ui are constants for every i S Ui is 
called the capacity of facility i. 

The 1.52-approximation algorithm of Mahdian, Ye, and Zhang [16] is built 
upon an earlier approximation algorithm of Jain, Mahdian, and Saberi [12]. We 
denote these two algorithms by the MYZ and the JMS algorithms, respectively. 
The analyses of both of these algorithms have the feature that allow the approx- 
imation factor for the facility cost to be different from the approximation factor 
for the connection cost, and give a way to compute the tradeoff between these 
two factors. The following definition captures this notion. 
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Definition 4. An algorithm is called a approximation algorithm for the 

generalized FLP, if for every instance J’ of the generalized FLP, and for every 
solution SOL for ^ with facility cost Fsol and connection cost Csol, the cost 
of the solution found by the algorithm is at most jfFsoL + IcCsoL- 

Recall the following theorem of Jain et al. [12] on the approximation factor 
of the JMS algorithm. 

Theorem A [12]. Let 'yj > 1 be fixed and jc '■= sup^j^fc}, where Zk is the 
solution of the following optimization program (which we call the factor-revealing 
LP). 







( 1 ) 


maximize 


Eti d. 


subject to 


y 1 < i < k : ai < a^+i 


(2) 




yi<j<i<k: rj^i > rj^i+i 


(3) 




yi<j<i<k: at < rjy -\- di -\- dj 


(4) 




i—1 k 






y 1 < i < k : max(rj_i — dj, 0) J- max(o;i — dj ,0) < f 


(5) 




j=l j=i 






V 1 < j < i < fc : aj,dj, f, rjj > 0 


(6) 



Then the JMS algorithm is a approximation algorithm for the UFLP. 

We will use the above theorem in this paper to give an alternative proof of 
the following theorem about the performance of the MYZ algorithm. 

Theorem B [16]. Let (yy, 7 c) be a pair obtained from the above faetor-revealing 
LP. Then for every <5 > 1, there is a ( 7 / + ln((5) + e, 1 + approximation 

algorithm for the UFLP. 

3 The Linear-Cost Facility Location Problem 

The linear-cost facility location problem is a special case of the generalized FLP 
in which the facility costs are of the form 

f (k) = / ^ k = 0 

\aik + b^ k>0 

where Oi and bi are nonnegative values for each i € bi and are called the 
setup and marginal (a.k.a. incremental) cost of facility i, respectively. 

We denote an instance of the linear-cost FLP with marginal costs (a^), setup 
costs {bi), and connection costs (cy ) by LFLP{a, b, c). Clearly, the regular UFLP 
is a special case of the linear-cost FLP with Oi = 0, i.e., LFLP{0, b, c). Further- 
more, it is straightforward to see that LFLP{a, b, c) is equivalent to an instance 
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of the regular UFLP in which the marginal costs are added to the connection 
costs. More precisely, let c^- = dj + at for i G ^ and j G and consider 
an instance of the UFLP with facility costs (bi) and connection costs (cy). We 
denote this instance by UFLP{b,c + a). It is easy to see that LFLP{a,b,c) is 
equivalent to UFLP{b,c+a). Thus, the linear-cost FLP can be solved using any 
algorithm for the UFLP, and the overall approximation ratio will be the same. 
However, for applications in the next section, we need bifactor approximation 
factors of the algorithm (as defined in Definition 4). 

It is not necessarily true that applying a ( 7 /, 7 c)-approximation algorithm 
for the UFLP on the instance UFLP{b,a + c) will give a ( 7 /, 7 c)-approximate 
solution for LFLP{a,b,c). However, we will show that the JMS algorithm has 
this property. The following lemma, whose proof is presented in Appendix A, 
generalizes Theorem A to the linear-cost FLP. 

Lemma 1. Let ( 7 /, jc) be a pair obtained from the faetor-revealing LP in The- 
orem A. Then applying the JMS algorithm on the instanee U FLP(b,a -\- c) will 
give a approximate solution for LFLP{a,b,c). 

The above lemma and Theorem 9 in [12] give us the following corollary, which 
will be used in the next section. 

Corollary 1. There is a (1,2) -approximation algorithm for the linear-cost fa- 
cility location problem. 

It is worth mentioning that the MYZ algorithm can also be generalized for the 
linear-cost FLP. The only trick is to scale up both a and b in the first phase by 
a factor of S, and scale them both down in the second phase. The rest of the 
proof is almost the same as the proof of Lemma 1. 

4 The Soft-Capacitated Facility Location Problem 

In this section we will show how the soft-capacitated facility location problem 
can be reduced to the linear-cost FLP. In Section 4.1 we define the concept 
of reduction between facility location problems. We will use this concept in 
Sections 4.2 and 4.3 to obtain approximation algorithms for the SCFLP and a 
generalization of the SCFLP and the concave-cost FLP. 



4.1 Reduction between Facility Location Problems 

A reduction from a facility location problem £/ to another facility location prob- 
lem AS is an efficient procedure R that maps every instance of ^ 2 / to an in- 
stance R{y) of AS. This procedure is called a (cr/, (Tc)-reduction if the following 
conditions hold. 

1 . For any instance of £/ and any feasible solution for with facility cost Ff^ 
and connection cost there is a corresponding solution for the instance 
R{J^) with facility cost F^ < crfFf^ and connection cost Ofg < (XcCf^. 
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2. For any feasible solution for the instance there is a corresponding 

feasible solution for whose total cost is at most as much as the total 
cost of the original solution for R{^). In other words, the facility location 
instance R{J^) is an over-estimate of the facility location instance 



Theorem 1. If there is a {af, ac)-reduction from a facility location problem si 
to another facility location problem SB, and a ('yf,jc)-(^pproximation algorithm 
for SB, then there is a (jfaf,jcO'c) -approximation algorithm for si. 

Proof. On an instance ^ of the problem si, we compute R{^), run the ( 7 /, 7 c)- 
approximation algorithm for on R{J^), and output the corresponding solution 
for In order to see why this is a 7 cCrc)-approximation algorithm for , 

let SOL denote an arbitrary solution for J^, ALG denote the solution that the 
above algorithm finds, and and C*^ and , respectively) denote 

the facility and connection costs of SOL {ALG, respectively) when viewed as a 
solution for the problem {IS* = si , SS\ By the definition of (ct/, CTc)-reductions 
and ( 7 /, 7 c)-approximation algorithms we have 

F^^^ + + G%^^ < 7/^i + 7cCi < 7/^/i"^ + 7c^cC^, 

which completes the proof of the lemma. □ 

We will see examples of reductions in the rest of this paper. 



4.2 The Soft- Capacitated Facility Location Problem 

In this subsection, we give a 2-approximation algorithm for the soft-capacitated 
FLP by reducing it to the linear-cost FLP. 

Theorem 2. There is a 2-approximation algorithm for the soft- capacitated fa- 
cility location problem. 

Proof. We use the following reduction: Construct an instance of the linear-cost 
FLP, where we have the same sets of facilities and clients. The connection costs 
remain the same. However, the facility cost of the *th facility is (1 -I- ^^)fi if 
fc > 1 and 0 if fc = 0. Note that, for every fc > 1, \:F~^ <1-1- < 2 • 

Therefore, it is easy to see that this reduction is a (2, l)-reduction. By Lemma 1, 
there is a (1, 2)-approximation algorithm for the linear-cost FLP, which together 
with Theorem 1 completes the proof. □ 

We now illustrate that the following natural linear programming formulation 
of the SCFLP has an integrality gap of 2. This means that we cannot obtain a 
better approximation ratio using this LP relaxation as the lower bound. 
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minimize 

subject to Vi e j € ^ : Xij < yi 

Vx G ^ ^ ^ Xij ^ Uiyi 

je*” 

Vj e ^ a:ij = 1 

iG-F 

Vie^,j€^: a:ijG{0,l} (8) 

V i G : yi is a, nonnegative integer (9) 

In a natural linear program relaxation, we replace the constraints (8) and (9) by 
Xij > 0 and yi > 0. Here we observe that even if we only relax constraint (9), 
the integrality gap is 2. Consider an instance of the SCFLP that consists of only 
one potential facility i, and k > 2 clients. Assume that the capacity of facility i 
is fc — 1, the facility cost is 1, and all connection costs are 0. Clearly, the optimal 
integral solution has cost 2. However, after relaxing constraint (9), the optimal 
fractional solution has cost 1 + -j^- Therefore, the integrality gap between the 
integer program and its relaxation is which tends to 2 as fc tends to infinity. 



4.3 The Concave Soft- Capacitated Facility Location Problem 

In this subsection, we consider a common generalization of the soft-capacitated 
facility location problem and the concave-cost facility location problem. This 
problem, which we refer to as the concave soft-capacitated FLP, is the same 
as the soft-capacitated FLP except that if r > 0 copies of facility i are open, 
then the facility cost is g{r)ai where g{r) is a given concave function of r. In 
other words, the concave soft-capacitated FLP is a special case of the generalized 
FLP in which the facility cost functions are of the form fi{x) = aig{\x/ui\) for 
constants a^, Ui and a concave function g. It is also a special case of the so-called 
stair-case cost facility location problem [10]. On the other hand, it is a common 
generalization of the soft-capacitated FLP (when g{r) = r) and the concave-cost 
FLP (when Ui = \ for all i). The concave-cost FLP is a special case of the 
generalized FLP in which facility cost functions are required to be concave (See 
[9]). The main result of this subsection is the following. 

Theorem 3. The concave soft-capacitated FLP is -reducible to the lin- 

ear-cost FLP. 

The proof of the above theorem is omitted here. The idea is to show that the 
concave soft-capacitated FLP is 1) reducible to the concave-cost FLP, and 
the latter is equivalent to the linear-cost FLP. Therefore, by Theorem 3, a good 
approximation algorithm for linear-cost FLP would imply a good approximation 
for the concave soft-capacitated FLP. 
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5 The Uncapacitated Facility Location Problem 



In this section, we present a new analysis of the 1.52-approximation algorithm of 
Mahdian, Ye, and Zhang [16] for the UFLP. The analysis of the MYZ algorithm 
in [16] is based on combining a result of Jain et al. [12] (which is proved using 
factor-revealing LPs) with an analysis of a greedy augmentation procedure of 
Charikar et al. [3]. Here, we analyze the MYZ algorithm using a single factor- 
revealing LP. This gives us a new perspective on the MYZ algorithm. As a 
corollary, we use a recent result of Thorup [21] that the JMS algorithm can 
be implemented in quasi-linear time to improve the running time of the MYZ 
algorithm. 

We begin by sketching the MYZ algorithm. The algorithm consists of two 
phases. In the first phase, we scale up the facility costs in the instance by a factor 
of S (which will be fixed later), and then run the JMS algorithm (see [12] for a 
description) on the modified instance. In addition to finding a solution for the 
scaled instance, the JMS algorithm outputs the share of each city of the total 
cost of the solution. Let aj denote the share of city j of the total cost (Therefore 
the total cost of the solution is The main step in the analysis of the JMS 

algorithm is to prove that for any collection S of one facility fs with opening 
cost Sf (/ in the original instance) and k cities with connection costs di, ... ^dk 
to fs and shares ai, ... ,ak of the total cost, the values Sf, dj’s, aj’s and rj/s 
(whose definition is omitted here, since we don’t need it) satisfy the inequalities 
(2)-(6) in Theorem A, except that the inequality (5) is replaced by 

2—1 k 

\/l <i < k : ^ max(rj_j — dj , 0) + ^ max(oi — dj ,0) < Sf (10) 

j=i j=i 

In the second phase of the MYZ algorithm we reduce the scaling factor 5 contin- 
uously, until it gets to 1. If at any point during this process a facility could be 
opened without increasing the total cost (i.e., if the opening cost of the facility 
equals the total amount that cities can save by switching their “service provider” 
to that facility), then we open the facility and connect each city to its closest 
open facility. The second phase of the MYZ algorithm is equivalent to a greedy 
augmentation procedure of [8, 3], and, in fact, a lemma from [3] is used in [16] 
in order to analyze the second phase. Here we analyze this phase differently. 
First, we modify the second phase as follows: Instead of decreasing the scaling 
factor continuously from <5 to 1, we decrease it discretely in L steps where L is a 
constant. Let Si denote the value of the scaling factor in the Fth step. Therefore, 
J = Ji>(52>---><5l = 1. We will fix the values of Sfs later. After decreasing 
the scaling factor from Si-i to Si, we consider facilities in an arbitrary order, 
and open those that can be opened without increasing the total cost. We denote 
this modified algorithm by MYZ^. Clearly, if L is sufficiently large (depending 
on the instance), the algorithm MYZ^ computes the same solution as the MYZ 
algorithm. 
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In order to analyze the above algorithm, we need to add extra variables and 
inequalities to the inequalities (2), (3), (4), (10) and (6). Let Vj^k+i denote the 
connection cost that city j in S pays after we change the scaling factor to Si and 
process all facilities as described above (Thus, rj^k+i is the connection cost of 
city j after the first phase). Therefore, by the description of the algorithm, we 
have 

k 

V 1 < * < L : ^ max(rj_fe+i - dj,0) < Sif, (11) 

j=i 

since otherwise we could open fs and decrease the total cost. 

Now, we compute the share of the city j of the total cost of the solution that 
the MYZl algorithm finds. In the first phase of the algorithm, the share of city 
j of the total cost is aj. Of this amount, rj^k+i is spent on the connection cost, 
and aj — rj^k+i is spent on the facility costs. However, since the facility costs 
are scaled up by a factor of S in the first phase, therefore the share of city j of 
facility costs in the original instance is equal to {aj — rj^k+i)/S. After we reduce 
the scaling factor from Si to Si+i {i = 1, L — 1), the connection cost of city 
j is reduced from Cj^k+i to Cj^k+i+i- Therefore, in this step, the share of city j 
of the facility costs is Cj^k+i — fj^k+i+i with respect to the scaled instance, or 
{cj^k+i — rj^k+i+i)/Si+i with respect to the original instance. Thus, at the end 
of the algorithm, the total share of city j of facility costs is 



S 



L-l 



+ E 



Si+i 



We also know that the final amount that city j pays for the connection cost is 
Tj^k+L- Therefore, the share of the facility j of the total cost of the solution is: 



^3 



L-l 






'^j,k-\-i j,k-\-i-\-l 



2=1 



^2+1 



+ f j,k+L+l — 



f + Ef7^-r 



^J,fc + 2 ■ 



(12) 



This, together with a dual fitting argument similar to [12], imply the following. 

Theorem 4. Let (C/jCc) be such that and is an upper bound on the 

solution of the following maximization program for every k. 



maximize 



eU (^ + Et.' (ih - 5-) ’'«+*) - «// 



subject to (2), (5), (4), {10), {11), {6) 



(13) 



Then, the MYZ^ algorithm is a {ff , f,c)~ 0 'PPLoximation algorithm for the UFLP. 
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In the following theorem, we analyze the factor-revealing LP (13) and rederive 
the main result of [16]. In order to do this, we need to set the values of Si’s. Here, 

L-i 

for simplicity of computations, we set Si to ; however, it is easy to observe 
that any choice of Si’s such that the limit of maxi(<5i+i — Si) as L tends to infinity 
is zero, will also work. The proof is omitted here. 

Theorem 5. Let Se a pair given by the maximization program in Theo- 

rem A, and 5 > 1 be an arbitrary number. Then for every e, if L is a suffieiently 
large constant, the MYZ^ algorithm is a (p/f -|-ln((5) -|- e, 1 -|- ) -approximation 

algorithm for the UFLP. 

The above analysis enables us to prove the following result. 

Corollary 2. For every e > 0, there is a quasi-linear time (1.52 -|- e)-approx- 
imation algorithm for the UFLP, both in the distance oracle model (where the 
connection costs are given by a matrix) and in the sparse graph model (where 
the connection costs are distances in a given graph). 

Proof Sketch. We use the MYZl algorithm for a large constant L. Thorup [21] 
shows that for every e > 0, the JMS algorithm can be implemented in quasi- 
linear time (in both distance oracle and graph models) with an approximation 
factor of 1.61 -I- e. It is straightforward to see that his argument actually implies 
the stronger conclusion that the quasi-linear algorithm is a ( 7 / -I- e, 7 c -I- e)- 
approximation, where ( 7 /, 7c) are given by Theorem A. This shows that the 
first phase of the MYZ^ algorithm can be implemented in quasi-linear time. 
The second phase consists of constantly many rounds. Therefore, we only need 
to show that each of these rounds can be implemented in quasi-linear time. This 
is easy to see in the distance oracle model. In the graph model, we can use the 
exact same argument as the one used by Thorup in the proof of Lemma 5.1 
of [ 21 ]. □ 

Acknowledgements. We would like to thank Asaf Levin for pointing out that 
our analysis of the 2 -approximation algorithm for the soft-capacitated facility 
location problem is tight. 
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A Proof of Lemma 1 



Proof. Let SOL be an arbitrary solution for LFLP{a,b,c), which can also be 
viewed as a solution for UFLP{b, c) for c = c + a. Consider a facility / that is 
open in SOL, and the set of clients connected to it in SOL. Let k denote the 
number of these clients, /(fe) = ak + b (for fc > 0) be the facility cost function of 
/, and dj denote the connection cost between client j and the facility / in the 
instance U FLP(h, a + c). Therefore, dj = dj — a is the corresponding connection 
cost in the original instance LFLP{a, b, c). Recall [12] the definition of aj and 
in the factor-revealing LP of Theorem A. It is proved [12] that < Vjj + dj + di. 
We strengthen this inequality as follows. 

Claim, di < Vjj dj + di 

Proof. It is true if di = dj since it happens only if rjj = dj. Otherwise, consider 
clients i and j(< i) at time t = di — e. Let s be the facility j is assigned to at 
time t. By triangle inequality, we have 

Csi — C52 “t” Og Si ^sj F di F dj F F di F dj S ^j,i F di F dj . 

On the other hand di < Csi since otherwise i could have connected to facility s 
at a time earlier than t. □ 

It is also known [12] that 



2—1 k 

max(rj_i — dj, 0) + max(ai — dj, 0) < b. 
i=i j=i 

Notice that max(a — x, 0) > max(a, 0) — x if x > 0. Therefore, we have 

2—1 k 

ma,x{rjj — dj, 0) + max(ai — dj,0) < b F ka. (14) 

Claim A and Inequality 14 show that the values dj, rij, dj, a, and b constitute 
a feasible solution of the following optimization program. 



maximize 
subject to 



-7/(gfc + fe) 

Ehd. 

\/l < i < k : di < di+i 

Vl<j<i<fc: rjj > rjjj^i 
Vl<j<i<fc: di < rjj + diF dj 

2—1 k 

\/l < i < k : m.ax(rjj — dj , 0) + max(o;i 

i=i j=i 

yi<j<i<k: dj,dj,a,b, rjj > 0 



(15) 



dj,0) < b+ ka 



However, it is clear that the above optimization program and the factor-revealing 
LP in Theorem A are equivalent. This completes the proof of this lemma. □ 
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Abstract. A graph is called i-connected from U to r if there are £ inter- 
nally disjoint paths from every node u G U to r. The Rooted Subset Con- 
nectivity Augmentation Problem (RSCAP) is as follows; given a graph 
G = {V -\- r,E), & node subset (7 C U, and an integer k, find a smallest 
set F of new edges such that G -I- U is fc-connected from U to r. In this 
paper we consider mainly a restricted version of RSCAP in which the 
input graph G is already (fc — l)-connected from U to r. For this version 
we give an 0(ln |(7|)-approximation algorithm, and show that the prob- 
lem cannot achieve a better approximation guarantee than the Set Cover 
Problem (SCP) on \U\ elements and with |U| — \U\ sets. For the general 
version of RSCAP we give an 0(ln k In | [7| )-approximation algorithm. For 
U = V we get the Rooted Connectivity Augmentation Problem (RCAP). 
For directed graphs RCAP is polynomially solvable, but for undirected 
graphs its complexity status is not known: no polynomial algorithm is 
known, and it is also not known to be NP-hard. For undirected graphs 
with the input graph G being (fc — l)-connected from V to r, we give an 
algorithm that computes a solution of size exceeding a lower bound of 
the optimum by at most (k — l)/2 edges. 



1 Introduction and Notation 

A graph is called i-connected from U to r if there are i internally disjoint paths 
from every node in [/ to r. In this paper we consider the following problem: 
Rooted Subset Connectivity Augmentation Problem (RSCAP): 

Input: A graph G = {V -\- r,E), node subset U CV, and integer k. 

Output: A minimum size set F of new edges such that G -I- F is fc-connected 
from U to r. 

For G being fco-connected from U to r we give an 0(ln(fc — fco) In |C/|)-approxi- 
mation algorithm for both a directed and an undirected RSCAP. On the other 
hand, we show that even for fco = fc — 1, the directed RSCAP cannot have a 
better approximation ratio than the Set Cover Problem (SCP) on |C7| elements 
and with \V\ — \U\ sets. 

For U = V we get the Rooted Connectivity Augmentation Problem (RCAP). 
A generalization of RCAP when one seeks an augmenting edge set of minimum 
weight is polynomially solvable for directed graphs [6], but is NP-hard for undi- 
rected graphs. However, the complexity status of an undirected RCAP (where 

S. Arora et al. (Eds.): APPROX 2003-1- RANDOM 2003, LNCS 2764, pp. 141-152, 2003. 
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every new edge has weight 1) is not known: no polynomial algorithm is known, 
and it is also not known to be NP-hard. We show an algorithm that computes a 
solution which size exceeds the optimum by at most (fc — l)/2 edges. 

We note that RCAP is related to the well-studied Vertex- Connectivity Aug- 
mentation Problem (VCAP): given a graph G and an integer k, find a smallest 
set F of new edges for which the graph G-\-F is A:-(node) connected. For directed 
graphs, Frank and Jordan [5] showed a polynomial algorithm. The complexity 
status of undirected VCAP is not known, but the following algorithms were ob- 
tained. For the case of G being {k— l)-connected, Jordan [9,10] gave an algorithm 
that computes a solution which size exceeds the optimum by at most (fc — l)/2 
edges. Recently, Jordan and Jackson [7] gave an algorithm that computes a so- 
lution with an additive gap at most ((fc — fco)(fc — 1) -|-4)/2, where fco is the initial 
connectivity of G. In [8], the same authors give an algorithm that for any fixed 
k computes an optimal augmenting edge set in polynomial time. 

Here is some notation and preliminary statements used in the paper. An 
edge from it to z; is denoted by uv. Given a graph, we call new edges that can 
be added to the graph /mfcs, to distinguish them from the existing edges. Let 
opt{G) = optk{G) denote the size of an optimal solution to the RSCAP on input 
G and k. For an arbitrary edge set F and set X let deg^(X) denote the degree 
of X with respect to F. Let G = {V -\- r, E) he a graph. For A C V we denote 
by Fc{X) = F{X) the set {v G V — X : uv G E for some u G X} of neighbors 
of X in V, and let X* = V — (X -\- F{X)). Let dr{X) denote the number of 
edges going from X to r, and define g{X) = dr{X) -\- |F(A)|. We say that X is 
i-tight (or simply that X is tight, if I is understood) if g{X) = 1. The following 
statement, which applies for both directed and undirected graphs, stems from 
Monger’s Theorem. 

Proposition 1. A graph G = iV -\-r,E) is i- connected from U to r if and only 
if g{X) > I for all X CV with X VU yf 0. 

Let G = {V -\-r,E) be a graph and let X,Y C V be arbitrary. The following 
“submodular” inequality which is valid for both directed and undirected graphs 
can be easily proved by counting the contribution of the nodes in F(X), F(Y) 
to its sides (e.g., see [2]). 

g{X)+g{Y)>g{XnY) + g{XUY) (1) 

2 Rooted Subset Connectivity Augmentation 

Theorem 2. For the restriction of a directed RSCAP to instances in which G 
is (fc — l)-connected from U to r, there exists: 

(i) An O {In \U\)- approximation algorithm, and 

(ii) A polynomial time reduction from the Set Cover Problem (SCP) on universe 
U with \V\ — \U\ sets such that there is a solution of size r to SCP if and 
only if there is a solution of size r to RSCAP. 
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Part (ii) of the theorem says that if one finds an algorithm with an approx- 
imation guarantee better than 0 (ln |t/|) for the above restricted version of the 
RSCAP, then one can get an approximation guarantee better than 0(ln |[/|) for 
the SCP on groimdset U; the latter is possible only if NP-hard problems can be 
solved in quasipolynomial time, see [4]. 

To prove the theorem, we will use the following well-known formulation of the 
SCP; in this formulation, J is the incidence graph of sets and elements, where 
A is the family of sets and B is the universe. 

Input: A bipartite graph J = {A + B,I) without isolated nodes. 

Output: A minimum size subset D <Z A such that Bj{D) = B. 

The proof of Theorem 2 follows. The following lemma follows from inequal- 
ity (1) and Proposition 1. 

Lemma 3. Let G be i-eonneeted from U to r, and let X, Y he i-tight sets sueh 
that X nY nU yf 0. Then X HY and X UY are both £-tight. 

Given an instance of the directed RSCAP with the input graph G being £- 
connected from U to r, we construct an instance J = {A + B, I) of the SCP 
as follows: B is the family of the inclusion minimal sets among the Otight sets 
intersecting U , A = V , and for a G A,b G B we have ab G I if, and only if, the 
subset of V corresponding to h contains a. Note that \B\ < \U\, by Lemma 3. The 
above construction is polynomial, since for every node u G U we can compute 
the unique set in B containing u (or determine that such does not exist) in 
polynomial time using max-flow techniques. 

Let T* be the optimal value of the following LP-relaxation for the obtained 
instance of the SCP: 



By a well-known result of Lovasz [11], the greedy algorithm (which repeatedly 
removes from J the node of maximum degree in A and all its neighbors, until 
B becomes empty) computes a feasible solution D G- A to the SCP of size at 
most H{\B\)t* . By Proposition 1, G -I- {vr : v G L>} is (£ + l)-connected from 
U to r. We claim that \D\ < -^jH{\U\)optk{G). Let P" be a link set such that 
G + F IS fe-connected from U to r, and let x be the vector on A=V defined by 
= 57 ^ deg^(w). Since deg^(X) > k—£ for any Gtight set X of G, x is a feasible 
solution to the above LP-relaxation. Thus \D\ < H(\B\)t* < H{\B\) ~ 

H{\U\)-j^\F\, where H{j) denotes the jth harmonic number. Consequently, the 
algorithm finds a link set that augments G to be (£+ l)-connected from U to r of 
size at most -j^F[{\U\)optk{G). Thus we have proved the following statement, 
which for kg = k — 1 implies part (i) of Theorem 2: 

Corollary 4. There exists an H(\U\)H(k — kg) -approximation algorithm for the 
directed RSCAP with the input graph G being ko-connected from U to r. 
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To prove part (ii) of Theorem 2, we will show that given an instance J = 
{A + B,I) of the SCP, one can construct in polynomial time an instance G = 
(P + r, E) of the directed RSCAP with kQ = k — l,V = A+B, and U = B, and 
such that: 

(a) For any solution F' for the RSCAP there exists a solution F with |F| = |F'| 
such that every edge in F connects some node u\V — U = A to r. 

(b) U C A is a solution to the SCP on J if, and only if, F = {vr : v G D} is a 
solution to the RSCAP on G. 

Note that by Proposition 1, replacing any edge xy in a directed graph which 
is fc-connected from U to r by a new edge xr results again in a graph that is 
fc-connected from U to r. This implies that for any feasible solution F' for a 
directed RSCAP there always exists a feasible solution F with |F| = |F'| such 
that r is the head of all the edges in F. 

Given an instance J = (A + R,/) for the SCP, we construct an instance 
G = {V + r,E) for a directed RSCAP by directing the edges in J from B to 
A, adding a new node r and k — 1 edges from each node in B to r, and setting 
U = B. Then G is (fc — l)-connected from U to r, and by Proposition 1, (b) holds. 
Now let F' be a set of links incident to r such that G + F' \s fe-connected from 
U to r. If there is ur G F' with u G U, then Fg{u) yf 0 (since in J there are no 
isolated nodes), and for any a G Fciu) the graph G + F where F = F' — ur + ar 
is fc-connected from U to r. Thus for the obtained instance of the RSCAP (a) 
holds. This finishes the proof of Theorem 2. 

For the undirected RSCAP similar results can be deduced. Using standard 
constructions, it is easy to prove that a p-approximation algorithm for the di- 
rected RSCAP implies a 2p-approximation algorithm for the undirected RSCAP. 
In particular, by Corollary 4, there exists a 2Fl{\U\)F[{k— fco)-approximation al- 
gorithm for the undirected RSCAP. On the other hand, for fco = /c — 1, one can 
show by a similar reduction that if one finds a solution of size r to the corre- 
sponding instance of the undirected RSCAP, then one can find a solution of size 
at most 2r for the SCP. 

3 Undirected Rooted Connectivity Augmentation 

In the rest of the paper we consider an undirected RCAP with fcp = fe — 1; that 
is we will assume that G is {k — l)-connected (from V) to r, and “tight” means 
(fc— l)-tight. By Lemma 3, the (inclusion) minimal tight sets are pairwise disjoint, 
and let ly = v{G) denote their number. For TCP, the T -components are the 
connected components of G — T, and the T-components not containing r are the 
sides of T. Let h{T) be the number of T-components. If |T| = k—1 and h{T) > 3 
then T is a shredder. Let b{G) = bk{G) = max{&(T) : T C U, |T| = fc — 1}. If 
G-l- T is fc-connected then |T| > v{G)j2 (since deg^(A) > 1 for every tight set 
X C V) and |T| > b{G) — 1 (since for any TCP with |T| = fc — 1, T must 
induce a connected graph on the T-components). Thus 



opt{G) > max{|’i^(G)/2],6(G) — 1}. 
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For fc — 1 = 0, it is clear that any tree on the components of G is an augmenting 
edge set of size b{G) — 1. For fc — 1 = 1 it is also easy to compute an optimal 
solution in polynomial time using the lower bound max{ [zz'(G)/2] , &(G) — 1}, 
where ;z'(G) = i^(G) + 1 if there is a tight set that contains all the minimal tight 
sets, and i^'(G) = zz(G) otherwise. For fc — 1 > 2 we prove the following theorem: 

Theorem 5. There is a polynomial algorithm that given a graph G which is 
(k — l)-connected to r finds a link set F of size at most max{ |"z/(G)/2] + [(fc — 
1)/2J,6(G) — 1} such that G + F is k-connected to r. 

We now give some preliminary statements used in the rest of the paper. The 
following inequality can be easily verified by counting the contribution to its 
sides of nodes in F{X), F{Y) and the edges incident to r. 

g{X) + g{Y) > g{X* n T) + g{X n Y*) + 2dr{X n Y) (2) 

Two disjoint subsets X^Y of are adjacent if there is an edge with one end 
in X and the other end in Y. Using inequalities (1) and (2) it is not hard to 
derive the following properties of tight sets. 

Lemma 6. Let X, Y be two tight sets in G. 

(i) If X nY 7 ^ 0 then X HY, X UY are both tight. 

(ii) If the sets X n Y*,Y n X* are nonempty, then they are both tight and 
nonadjacent and dr{X n F) = 0. 

(Hi) If X,Y are disjoint and |X| < |F| then exactly one of the following holds: 
(a) X D Y* , Y n X* are nonadjacent tight sets, or (h) X F F(Y). 

3.1 Independent Families 

Definition 7. A family T of pairwise disjoint tight sets is independent if there 
exists a partition II of T and a family S{F) = {S-p : V G 11} of pairwise disjoint 
tight sets such that: 

(i) For every V G II holds: U{5' : S G V} C Sp, if \V\ yf 2 then equality holds, 
and if \V\ > 3 then V consists of some sides of a shredder. 

(ii) For any disjoint X,Y G F U S{F) holds: X — F(Y),Y — F{X) are both 
nonempty if, and only if, X, Y belong to the same part in 77. 

If in addition to (i) and (ii), for any part V = {Si,Sj} G II we have that 
any tight set that intersects Sp is contained in one of Si, Sj then F is strongly 
independent . 

Let F- be the following relation on tight sets: {X, Y) G F ii X — F(Y) and 
Y — F{X) are both nonempty. Given a family F of tight sets, let F{F) denote 
the restriction of F to F. Clearly, F{F) is symmetric and reflexive, and, if F 
is independent, then F{F) is an equivalence, with the corresponding partition 
into equivalence classes 77 as in the definition. It is not hard to verify that any 
subfamily of an independent family is also independent. Note that condition (ii) 
in the definition of an independent family implies S' G_ F{S") or S" C F{S') for 
any distinct S' , S" G S{F). But Lemma 6(iii) implies a stronger statement: 
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Proposition 8. //|5"'| < |S"| for distinct S',S” G S{iF) then S” C r{S'). 

We call an independent family trivial if the corresponding partition is trivial, 
that is if 7T = {^}. Let /3(G) denote the maximum cardinality of an indepen- 
dent family in G. Note that even trivial independent families strictly generalize 
shredders. Indeed, any subfamily of sides of a shredder forms a trivial indepen- 
dent family; thus /3(G) > b{G) — 1. However, even trivial independent families 
with two sets might not correspond to a shredder, see Example 1 below. If /F 
is a trivial independent family, then \T\ can be as large as n — k + 1. However, 
as Theorem 9 below implies, a nontrivial independent family has at most k — 1 
sets; Examples 2,3 below show that this bound is tight. For a family T of sets, 
let ||/F|| denote the cardinality of the union of the sets in T . 

Theorem 9. Let T he a nontrivial independent family, and let S' = S-p' be the 
largest set in S{T). Then \V'\ -I- ||il — V'\\ <k — \. 

Proof. We need the following claim: 

Claim: Let Y he an i-tight set, and suppose that there is a node v G Y such that 
there are i internally disjoint paths from r to v. Then for any set X C V disjoint 
toY holds: dr{x) + \r{x)- (YU T(Y))\ > |xnr(y)|. 

Proof: Consider a set of i internally disjoint paths from r to v. Then \X n T(E)| 
of them contain a node from X. In each of these \X n T(E)| paths pick the first 
node whose successor is in X. Such a node is either r or in P(X) — (F U P(Y)), 
so it contributes 1 to the left side of the inequality. 

Note that > ||P|| and \Sp\ = ||P|| if \V\ yf 2 for any V & LI. Let S” = 
Sp" be the second largest set in S(T). Then g(S”) = k—1, \P(S") n 5"| > \V'\ 
(by condition (ii) in Definition 7), and S" C P(S') = S" (by Proposition 8). The 
statement follows by applying the claim above on 5" = F and S” = X: 

g(S") = dr(S") + 1^(5"') - (S' U T(S"))| + |G(S"') n S'\ + 1^(5"') n T(S")| > 

> \s"\ + \v'\ + \\n-v'-r"\\ = \v'\ + \\n-v'\\. 

Examples: 

1. Let u,v be two nodes of a cycle, where r ^ u,v \s arbitrary and fc — 1 = 2. 
Then T = {{rt},{r'}} is an independent family. If u,v are adjacent, then T is 
nontrivial and strongly independent. Assume that u,v are nonadjacent. Then T 
is trivial and not strongly independent. Let us consider some modifications. Let 
P be the path between u and v. Suppose that none of u,v is incident to r. Let 
u' be the neighbor of u not belonging to P and define v' in the same way. Let 
G be the graph obtained by connecting each of u',v' to all the internal nodes 
of P. If P has at least two internal nodes, then T is strongly independent in 
G. Otherwise (P has one internal node) T is not strongly independent, but the 
family {{rt, u'}, {r>, F}} is strongly independent; T becomes strongly indepen- 
dent if we add to G the links rv! ,rv' . Note that there are no shredders in the 
graphs considered. 

2. Let G = Kk be a complete graph on k nodes. Then \p(G)/2\ = \(k — l)/2] , 
b(G) — 1 = 0, but opt(G) = /3(G) = k — 1. Here any family of pairwise disjoint 
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tight sets forms a trivial independent family. Let us replace one node of G by a 
clique of size at least fc — 1, connecting the edges of Kk to distinct nodes of the 
clique. The new graph contains a nontrivial independent family of size k — 1. 

3. Let G = Kk-i^k-i be a complete bipartite graph with k — 1 nodes on each 
side and parts R,S where r G R. Then \v{G)/2\ = \{2k — 3)/2] = k — 1 and 
/3(G) = 6(G) + 1 = fc - 1. Indeed, 6(G) = b{S) = k-2, and /3(G) = |i?| - 1 + 1 = 
k — 1 since i? — r + s is an independent family for any s G S (so there are 
k — 1 distinct nontrivial independent families of size /3(G) = fc — 1 in G). Also, 
opt{G) = fc — 2 + |"(/c — l)/2] . An optimal augmenting edge set is obtained by 
connecting every node in i? — r to r, picking a maximum matching on S, and if 
fc — 1 is odd adding one more edge from the unmatched node in 5" to r. 

4. Let r be a leaf of a tree G (so fc — 1 = 1) with odd number v + 1 oi leaves, 

in which every non-leaf node has degree 3. Then 6(G) — 1 = 8(G) = 2 but 

opt(G) =r(Jl)/2l. 

3.2 Main Results 

A tight set is a core if it contains a unique minimal tight set. By Lemma 6(i), 
the union and intersection of any two intersecting cores are also cores. Thus 
for every minimal core G (that is, a minimal tight set) there exists a unique 
maximal core S containing it. As was mentioned in Section 2, the minimal cores 
can be computed in polynomial time. Let Gi , . . . , G,y be the minimal cores of 
G. By Proposition 1 G -I- T" is fc-connected to r if, and only if, G + F has no 

cores; thus the graph G -I- {vir : Vi S Ci, i = 1, . . . , z/} is fc-connected to r, and 

opt(G) < iy(G). We say that a link e is (u, 2) -reducing for G if iy{G-\-e) < i/(G) — 2. 
To prove Theorem 5 we use the following two theorems: 

Theorem 10. Let G be (k — 1) connected to r and let F he a subfamily of the 
family of maximal cores of G. Then exactly one of the following holds: 

(i) there is a (u, 2) -reducing link for G connecting two distinct cores in T, or 

(ii) F is strongly independent. 

Thus if \F\ > fc, then either there exists a (v, 2) -reducing link connecting two 
cores in F , or the sets in F are sides of the same shredder. In particular, if 
v(G) > fc, then either there exists a (v, 2) -reducing link for G, or v(G) = 6(G) — 1. 



Theorem 11. Let G be (k — 1) connected to r. If b(G) > fc, then there exists 
a polynomial algorithm that finds a link set F of size at most max{|"(i/(G) -I- 
l)/2] , 6(G) — 1} such that G -\- F is k-connected to r. 

Proof, (of Theorem 5) : The algorithm is as follows: 

Ifb(G) > fc, find an augmenting link set as in Theorem 11. 

Else, perform the following two steps: 

1. Find and add a (z^, 2)-reducing link, as long as one exists. 

2. In the resulting graph, add one link from every minimal core to r. 



148 



Zeev Nutov 



If b{G) > k the algorithm finds an augmenting link set as required, by Theo- 
rem 11. Suppose that b{G) < fc — 1, and let Fi,F 2 be the link sets added at steps 
1,2, respectively. Then the resulting graph G -I- -I- is fc-connected to r, by 

Proposition 1. By Theorems 10 and 9 IF 2 I = v{G + Fi) <k — 1. Thus 

IF 1 I + IF 2 I = (z.- IF 2 D/ 2 + IF 2 I = [Z./21 + L|i^2|/2J < \v/2\ + L(fc-1)/2J. 

The proof of Theorems 10 and 11 follows. Let Gi,...,Gi, be the minimal 
cores of G. For I C {1, . . . , z/}, let 5/ denote the collection of tight sets containing 
Uig/Gi and not containing any other minimal core. Let Si be the union of the 
sets in Sj; we set S'/ = 0 if 5/ = 0. By Lemma 6(i), if S/ yf 0 then S/ is tight, and 
thus it is the inclusion maximal set in Sj. Also, for any I' C /, S// C S/ holds. 
For simplicity, Sy means Sp^jj and Si = S{q = Su. Note that Si C\ Sj = Smj 
for any /, J C {1, . . . , z/} with S/, Sj yf 0. Thus we have: 

Proposition 12. (i) The sets Si are pairwise disjoint. 

^rij dj Sip^ Spj y~ 0, then Sip fl Spj — Sp. 

Clearly, if there is a 2) reducing link, then its endnodes belong to distinct 
minimal cores G/, Cj. Using Lemma 6 it is not hard to prove the following state- 
ment: 

Proposition 13. Let Ci,Cj be minimal cores. Then the following are equiva- 
lent: 

(i) There exists a (n, 2) -reducing link connecting Ci and Gy. 

(a) (A) Si — F{Sj) and Sj — F{Si) are both nonempty, and (B) Sij = 0. 

(Hi) Any link connecting Ci and Cj is {i/, 2) -reducing. 

Let iF be a subfamily of the family of maximal cores of G. 

Lemma 14. If no {u, 2) -reducing link that connects two cores in T exists, then 
the relation 'F{F) is an equivalence. 

Proof. Symmetry and refiexivity are obvious, so we need to prove transitivity. 
Suppose therefore that {Si, Sp}, {Sp, Sj} € 'R.(F) for distinct Si,Sp,Sj G T. 
Then Sip, Spj yf 0 by Proposition 13. Thus Ci C Sip C S*j C S*j, and Sj C Spj, 
by Lemma 6(ii). Thus we must have Ci n F{Sj) = 0. For a similar reason, 
Cj n F{Si) = 0. This proves transitivity. 

The following two lemmas (the proof is omitted) are used to establish that 
the equivalence classes of size at least three of TZ{F) correspond to sides of a 
shredder. 

Lemma 15. Let A, B be disjoint nonadjacent tight sets. If AU B is tight, then 
r{A) = F{B) and dr{A) = dr{B) = 0. 

Lemma 16. Let A, B, C be pairwise disjoint tight sets such that none of them is 
contained in the set of neighbors of the other, and such that the union of any two 
of them is tight. Then dr{A) = dr{B) = dr{C) = 0 and F(A) = F(B) = F{C). 
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Corollary 17. Let T he a subfamily of the family of maximal cores of G such 
that no {v, 2) -reducing link that connects two cores in T exists, and let V he an 
equivalence class ofTZ{T). Then: 

(i) If \V\ > 3 then V consists of some sides of the same shredder. 

(a) IfV = {Si,Sj} then Sij 0. 

Proof. Part (ii) follows from Proposition 13 and we will prove Part (i). We 
will show that if distinct Si, Sj,Sp belong to the same class of TZ{iF) then they 
satisfy the conditions of Lemma 16. Since Si, Sj, Sp are distinct, they are pairwise 
disjoint (by Proposition 12(i)), and by the definition of TZ{T), none of them is 
contained in the set of neighbors of the other. It remains therefore to show that 
the union of any two of them, say Si U Sj, is tight. By Proposition 13 and the 
definition of each one of the sets Sij, Sjp, Spi exists. Thus {Sip U Spj) n Sij 

is tight, by Lemma 6, and {Sip U Spj) n Sij = {Sip n Sij) U {Spj n Sij) = SiU Sj, 
where the last equation follows from Proposition 12(ii). Thus Si U Sj is tight. 

Proof, (of Theorem 10): Let T be as in Theorem 10. From Proposition 13 and 
the definition of an independent family it follows that if case (ii) of Theorem 10 
holds (that is, if T is strongly independent), then case (i) cannot hold. The 
rest of the proof shows that if case (i) does not hold, then case (ii) must hold. 
Suppose therefore that no {v, 2)-reducing link connecting two distinct cores in T 
exists. Then, by Lemma 14, TZ{iF) is an equivalence, and let 77 be its partition 
into the corresponding equivalence classes. For V € II let S-p be the union of the 
sets in 7^ if 1 7^1 yf 2, and Sp = Sij it V = {Si, Sj}. Combining this setting with 
Corollary 17, we conclude that condition (i) in the definition of an independent 
family is satisfied for T , 77, and S, and, moreover, if T is independent, then 
it is strongly independent. We show that condition (ii) is also satisfied. Let 
X,Y € IFUS be disjoint with X—P{Y) yf 0, Y—P{X) yf 0. Then by Lemma 6(iii) 
XnY*,Y nX* are both tight. Let Sx be an arbitrary maximal core intersecting 
XnY*, and let Sy be an arbitrary maximal core intersecting Y DX*. Note that 
Si C S and Si ^ T for any maximal core Si and S ^ T S that intersect. Thus 
Sx C X and Sy C Y, and Sx,Sy € J-. However, Sx intersects X f]Y*, Sy 
intersects Y n X* , implying that Sx — P{Sy),Sy — P{Sx) are both nonempty; 
therefore, Sx,Sy belong to the same class of IZ{T). Since X,Y are disjoint, 
X = Sx and Y = Sy, which finishes the proof. 

The proof of Theorem 10 is done. We now prove Theorem 11. 

Lemma 18. Let T be a shredder and let Y he a tight set. 

(i) If P{Y) = T, then Y is a union of some sides of T . 

(ii) If Y intersects two distinct sides Xi,Xj ofT, then Xi,Xj C Y. 

Proof. Part (i) is obvious, and we will prove part (ii). By Lemma 6(i), the sets 
Y Xi,Y C] Xj are tight, and their union (which is the intersection of two 
intersecting tight sets Y,Xi U Xj) is also tight. Moreover, Y H Xi,Y C] Xj are 
nonadjacent, since Xi,Xj are nonadjacent. Thus P{Y n Xi) = P{Y C Xj) = T, 
by Lemma 15. Part (ii) follows now from part (i). 
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Two intersecting sets X,Y are crossing, (or Y crosses X) if none of them 
contains the other. 

Lemma 19. No tight set crosses a side or the union of sides of a shredder. 

Proof. Let T be a tight set intersecting some side X of a shredder T. By 
Lemma 18(ii), if Y intersects all sides of T, then it contains all of them. As- 
sume therefore that there is a side X' of T disjoint to Y. Let Z = X U Y. 
Then (i) dr{Z U X') = dr{Z) (since 4(X') = 0); (ii) P{Z U X') C P{Z) and 
r{Z U X') = r{Z) if, and only if, Z and X' are nonadjacent (since P{X) = 
r{X') = T and X C Z). Thus Z and X' are tight and nonadjacent. Moreover, 
Z U X' is tight (since Z,X\J X' are intersecting and tight, and since X C Z). 
Thus r{Z) = T(X') = T, by Lemma 15. Consequently, Z must be a union of 
some sides of T, by Lemma 18(i). Now, if Y intersects a side of T distinct from 
X, then X C F, by Lemma 18(i); otherwise, F C X, and the proof is complete. 

Given a nontrivial partitition W of a groundset W, a link set F on IF is a 
yV -connecting W -cover if the following three conditions hold: (a) degp{w) > 1 
for every w G IF; (b) every link in F connects distinct parts of W; (c) F induces 
a connected graph on the parts of W. Let max(yV) denote the largest cardinality 
of a set in W. The following statement can be proved by induction on |1F|. 

Lemma 20. Let W be a nontrivial partition of a groundset W . Then the min- 
imum cardinality of a W-connecting W -cover equals max{ [|lF|/2] , max(W), 
|yy| — 1}, and an optimal cover can be found in polynomial time. 

Corollary 21. Let T be a shredder with b(T) > k and suppose that every T - 
component contains at most b(T) — 1 minimal cores. Then given T , an aug- 
menting link set for G of size max{[(r/(G) + 1)/2],6(T) — 1} can be found in 
polynomial time. 

Proof. Let R be the side of T that contains r, let IF' be the set of minimal cores 
of G, and let W = W + r. By Lemma 19 the inclusion in the T-components 
induces a partition W of W, and let F be a minimum cardinality W-connecting 
cover of W. Note that F can be computed in polynomial time. By Lemma 19, 
for any tight set F of G exactly one of the following holds: 

(i) F is properly contained in a T-component or is a union of some but not all 
T-components, and thus F has an edge connecting F and F*, or 

(ii) F contains all T-components, and thus F has an edge connecting F to r. 
Thus G + F is fc-connected to r. Note that |1F| = n(G) + 1, |W| = b(T), and 
max(W) < b(T) — 1 = | W| — 1. Hence, by Lemma 20, |F| = max{ [llFl/2] , |W| — 
1} = max{[(i/(G) + 1)/2],&(T) - 1}. 

Consider the following algorithm applied on a shredder T with b(T) = b(G) > k. 
Phase 1: While there exists a T-component X containing b{T) minimal cores 
add to G a (z^, 2)-reducing link connecting two cores in X. 

End While 

Phase 2: Add to G a link set as in Corollary 21. 
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The condition in the loop of Phase 1 ensures that a (z/, 2)-reducing link con- 
necting two cores in X exists; otherwise by Theorem 10 the maximal cores 
contained in X are sides of the same shredder with at least b{T) sides, while T 
has h{T) — 1 sides; this contradicts the maximality of b{T). Consequently, the 
algorithm correctly finds an augmenting link set of size at most max{ [ {v{G) + 
1)/2],6(T) — 1}, by Corollary 21. 

To finish the proof of Theorem 11, it remains to show that a shredder T with 
b{T) = b{G) can be found in polynomial time. In fact, all the shredders can be 
found in polynomial time (the number of shredders is at most (2|P| — 2k+ l)/3, 
see Theorem 22 below). This can be done using the algorithm of Cheriyan and 
Thurimella [3] who showed that a corresponding problem in a (fc — l)-connected 
graph is solvable in polynomial time. 

4 Applications 

Here we discuss some consequences from the previous sections, starting with 
deriving an upper bound on the number of shredders. Consider the family £ 
obtained by picking for every shredder its sides and the union of its sides; we 
color the former blue and the latter red. Let U be the union of the sets in £, and 
note that \U\ < \V\ — |£(r)| < |P| — fc -I- 1. Note that £ is laminar (that is, its 
members are pairwise noncrossing), by Lemma 19. It is well known that a laminar 
family on U has at most 2|f7| — 1 members, thus |£| < 2(|P| — |£(r)|) — 1. We 
can represent £ as a forest of rooted trees if we order the sets in £ by inclusion: 
X is a child of P if X is the largest set in £ properly contained in Y . Then this 
forest has the following properties: (i) every set is either blue or red, but not 
both; (ii) the children of every red set are all blue, and there are at least two 
of them. Therefore, the number of red sets, which is exactly as the number of 
shredders in the graph, is at most half the number of blue sets. Thus we have: 

Theorem 22. Let G = {V + r^E) be {k — \)-eonnected to r. Then the number 
of shredders in G is at most (2|P| — 2|£(r)| — l)/3 < (2|P| — 2k + l)/3. 

An edge e of a graph H is critical w.r.t. a certain property if H satisfies 
this property but H — e does not. Splitting off two edges su, sv means replacing 
them by a new edge uv. Using Theorem 10 it is not hard to prove the following 
“splitting off” theorem: 

Theorem 23. Let H = (V + r, E) he k-connected from V — s to r, where s is 
a neighbor of r, such that every edge sv of H , v ^ r is critical with respect to 
k- connectivity from V — s to r. Then either (i) there exists a pair of edges su, sv 
with u,v € £fr(s) that can he split-off while preserving the k- connectivity from 
V — s to r, or (ii) the family of maximal cores of G = H — s is independent. 

We note that Theorem 23 is related to (but is also independent of) similar 
theorems in [1], [9], and [2]. Provided that (A) deg(s) >k-\-2 and (B) \V\ > 2k, 
these theorems give a characterization when there exists a pair of edges incident 
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to s that can be split-off while preserving: “global” fc-connectivity in [1] and 
[9,10], and fc-connectivity from TZ to s = r in [2]. Our Theorem 23 which considers 
a different but related setting, gives a necessary and sufficient condition without 
restrictions (A) and (B). However, if (A) holds, then our characterization takes 
a similar form to the one given in [2, Theorem 3]. 

Let us call a sequence F* = (ei, . . . , Cp) of links (i/, 2)-reducing for G if Ci is 
{v, 2)-reducing for G + {ei, . . . , ei_i}, i = 1, . . . ,p. Let ((G) be the maximum 
length of a {v, 2)-reducing link sequence for G. A link set is basic if every its link 
connects two minimal cores of G or connects a minimal core of G to r. It is easy 
to see that there exists a basic augmenting link of size opt{G). Using Theorem 10 
and Proposition 8, we can prove the following theorem: 

Theorem 24. Among all basic augmenting link sets of size opt{G), let F be one 
with the maximal number of links incident to r. Then an arbitrary ordering of 
the links in F that are not incident to r is a {v, 2) -reducing sequence for G of 
maximal length. Thus opt{G) = v{G) — C,{G) > (3{G). 

Note that computing a maximum length (z/, 2)-reducing sequence for G is 
not equivalent to finding a maximum matching in the graph induced on the 
minimal cores by the (y, 2)-reducing links (formally, the nodes of this graph are 
the minimal cores of G, and we connect two cores by an edge if and only if there 
exists a (z/, 2)-reducing link connecting them); see Example 4 in Section 3.1. 
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1 Introduction 

Motivation. Scheduling and packet-routing have emerged as important prob- 
lems in modern computer and communication systems. In this paper, we consider 
such problems in the setting of an arbitrary synchronous, adversarial network. 
In an adversarial network, the nature of the incoming traffic is decided by an 
adversary, operating under a reasonable rate restriction. Such networks have at- 
tracted attention in recent years as they appear to be a convenient and useful 
way to model packet injections into a communication network; in addition, these 
networks inspire algorithm developers to design robust algorithms that provide 
a performance guarantee regardless of the nature of the incoming traffic. Thus, 
the adversarial input model provides a valuable, complementary point of view 
to that of the more traditional stochastic model. 

Problem description. The communication network is modeled by a directed 
graph G = {V,E) in which the nodes represent processors and the arcs (or 
edges) represent links between processors. Two natural models arise, depending 
on whether the adversary specifies a route for the packets she injects: In the non- 
adaptive (or circuit-switched) model, the algorithm is required to route a packet 
along the path specified by the adversary; in the adaptive (or packet-switched) 
model, the adversary specifies only the origin and destination for each packet, 
but does not specify a path. In this case, the algorithm is free to route a packet 
along any path from its origin to its destination. 

Packets are injected by an adversary subject to a natural rate restriction 
specified in terms of two parameters r and w. For the non-adaptive model, the 
packets injected by the adversary (and their associated paths) should be such 
that in any time window of size w, the number of packets injected during this 
window requiring any arc must be at most [rrcj . For the adaptive model, the 
analogous restriction is that the adversary must be able to associate paths to the 
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packets injected in any time window of size w such that the number of packets 
requiring any arc is at most [rwj . This condition can be conveniently captured 
by an associated integer multicommodity flow problem having an optimal value 
at most [rwj . 

In this paper we focus on the adaptive model, although most of our results 
can be extended to the non-adaptive model as well, with virtually no changes. 
In fact, we focus on the adaptive model in which the adversary is allowed to 
split packets and route them using multiple paths. Essentially, the restriction 
on the adversary translates to an associated fractional multicommodity flow 
problem having an optimal value at most rw. For this model, we consider the 
problem of designing effective routing/scheduling algorithms. Our main result is 
a simple algorithm for this problem that is stable (bounded number of packets 
in the system), with a bound on the number of packets in the system that is 
0{w/{l — r)) for any fixed network G. This implies a worst-case delay bound on 
packets that is relatively small as well. A noteworthy feature of this result is that 
this matches the traditional queueing-theoretic number-in-system bound, which 
is usually 0(1/(1 — r)). In the rest of this paper, we assume a fixed network G, 
and so we often omit the dependence of the bounds on the network parameters. 

Related work. Adversarial networks have received a lot of attention in recent 
years. They were first introduced by Borodin et al. [9], and further elaborated by 
Andrews et al. [3,4]. Later, these were seen to be non-trivial generalizations of 
earlier models of Cruz [10]. The original papers of Borodin et al. [9] and Andrews 
et al. [3,4] contain a wealth of interesting results, but mostly on the non-adaptive 
case. 

The models most closely related to our work were first introduced by Aiello 
et al. [2] . In their work, they provided an elegant extension of the restriction on 
the adversary, which was previously considered only for the non-adaptive case. 
Furthermore, they constructed a distributed protocol with the number of pack- 
ets in the system being 0{w/{l — r)). Their results were derived for the integer 
{w, r) adversary. Motivated by the observation that this restriction is not effi- 
ciently checkable, Gamarnik [12] introduced the fractional {w, r) adversary: here, 
the adversary is allowed to associate fractional paths ( “flows” ) to the packets to 
satisfy the load condition. An interesting question, then, is to quantify the per- 
formance loss due to the increased power given to the adversary. Gamarnik [12] 
constructed an algorithm such that the number in system is 0{w^ /{I — r)^); 
furthermore, he observed that a naive adaptation of the methods of Aiello et 
all. [2] can at best lead to a bound of 0(1 /(I — r)^). 

In more recent work, Andrews et al. [5] derive distributed source routing and 
scheduling algorithms with polynomial delay bounds using a discrete-review like 
strategy; these delays bounds obviously translate to bounds on the number-in- 
system. The algorithm described in this paper can also be viewed as a source 
routing/scheduling algorithm, as the route for a packet is determined at its 
source; the queue-length bounds we prove are stronger than those implicit in [5], 
but our algorithm is centralized. For the special case in which there is only a 
single destination, stronger bounds are known [6]. 
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Results. For the dynamic adaptive packet routing problem in an adversarial 
queuing network with a fractional (rc, r) adversary, we design an efficient al- 
gorithm that keeps the queue-lengths bounded. Specifically, we show that the 
number of packets in the system at any time t, Q{t), satisfies 



Q{t) < 



m{m -|- 2n -|- vn?'n? + w) 
1 — r 



( 1 ) 



where m and n are the number of arcs and nodes in the network. This matches 
the known bound (as a function of w and r) for the same problem with an 
integer (w, r) adversary. Our results immediately imply small delay bounds for 
the packets as well. 

Our bounds obviously apply in the special case when rates are associated 
with origin-destination pairs. Specifically, suppose packets for a particular origin- 
destination pair i, j arrive at rate As long as an associated fractional multi- 
commodity flow problem has optimal value at most 1, we can find a scheduling 
policy with the number of packets bounded by the expression (1), where r can 
be explicitly determined based on the Vij and the network topology alone. 

Our results are achieved by a combination of techniques: we use a discrete 
review policy, which reduces the dynamic scheduling and routing problem to a 
sequence of static, adaptive packet routing problems; using a rounding theorem 
due to Karp et al. [13] , we reduce each of these problems to a non-adaptive packet 
scheduling problem; these packet scheduling problems can be solved effectively 
using algorithms due to Bertsimas and Sethuraman [8] or Sevastyanov [14,15,16]. 

The rest of this paper is structured as follows: in Section 2 we describe the 
model in more detail; Section 3 describes the scheduling/routing algorithm, and 
formally specifies the details in each of the steps informally outlined above. 



2 Model 

The model we consider is the “adversarial queueing network” model advocated 
by Borodin et al. [9] , as modified by Aiello et al. [2] ; we refer the reader to these 
original papers for a thorough motivation of the adversarial model. The basic 
model used throughout this paper can be described as follows: The communi- 
cation network is modeled by a directed graph G = (V,E), with \V\ = n and 
\E\ = to; this network is populated by packets, which originate in some node 
of the network, and need to reach some other node of the network. Associated 
with each arc (it, v) is an infinite buffer that stores the packets requiring the arc 
(u,v). We assume a synchronous network, in which time is divided into steps, 
conveniently numbered by the non-negative integers, and indexed by t. Packets 
require unit time to traverse an arc, and each arc can process at most one packet 
in a time step. 

Packets are injected into the network by an adversary operating under a 
restriction specified in terms of two parameters r and w. Restrictions of this sort 
were first considered in [9,3,4] for the non-adaptive version, and were extended in 
an elegant way by Aiello et al. [2] to the adaptive version as follows: Let Ay [ti , ^ 2 ) 
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be the set of packets injected into the network during the time interval [^ 1 ,^ 2 ], 
with origin i and destination and let 



— [J Aij[ti,t2)- 

i,j&V 



An adversary is an integer (w,r) adversary for some r (0 < r < 1) and some 
integer re > 1 if and only if for any t, the adversary can associate a path to each 
packet in A[t,t + w) such that every arc belongs to at most [™J paths. (Note 
that the adversary is not constrained to have a single path in her mind for the 
packets she injects. A packet p injected at time t will belong to w different time 
windows; the adversary is allowed to associate different paths to packet p at the 
time instants t — w + 1, t — ic + 2, . . . , t — 1, t.) 

Consider the following integer multicommodity flow problem 



(IMF) Min C{t) 
subject to: 



E ^ 


il 

Vij — 


Aij[t,t + w), 


Vi,i e E, 
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V(fc,z) e E, 





> 0, integer, 



where represents the number of packets that travel from node i to node j 
that use the arc (fc, /). It is easy to see that an adversary is an integer (w,r) 
if and only if the optimal value, C*{t), of {IMF) is at least [™J . Since the 
integer {w,r) adversary is defined in terms of an integer multicommodity flow 
problem, it is A^P-complete to check whether or not an input stream generated 
by an adversary respects the restrictions imposed. To overcome this limitation, 
Gamarnik [12] considered a model in which the adversary is allowed to split 
packets. An adversary is a fractional {w,r) adversary for some r (0 < r < 1) 
and some integer ic > 1 if and only if for any t, the adversary can fractionally 
schedule (or associate flows with) all the packets in A[t, t + w) such that the 
load on each arc is at most rw. Equivalently, an adversary is a fractional {w, r) 
adversary if and only if the linear programming relaxation of (IMF) has optimal 
value at most rw. The fractional (ic, r) adversary is less constrained, and hence 
can generate input streams that are inadmissible for the integer (re, r) adversary. 
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For the integer {w, r) adversary, Aiello et al. [2] constructed a routing and 
scheduling policy for which the total number of packets in the system is 



O 



1 — r 



In fact, their algorithm is distributed and uses only local information. Gamar- 
nik [12] designed a centralized algorithm for the fractional (w, r) adversary for 
which the total number of packets in the system is 



O 



n'^m^ + w'^m\ 
( 1 - 0 ^ 



Gamarnik [12] left open the problem of designing an algorithm for which the to- 
tal number of packets in the system is 0{w/{l—r)), matching the bound of Aiello 
et al. [2] for the integer (m, r) adversary. Our main result is an algorithm with 
this performance bound. We achieve this using a combination of techniques that 
have proved to be useful in a host of other problems: these include a scheduling 
algorithm for large job shop scheduling problems due to Bertsimas and Sethu- 
raman [8], and the rounding theorem due to Karp et al. [13]. 

To avoid ambiguity, we specify explicitly the sequence of events occurring at 
any time step: first, packets traverse arcs; next, the adversary injects new packets 
into the nodes; and finally, packets that reach their destination are absorbed by 
the corresponding node. 



3 The Routing and Scheduling Algorithm 

An overview of the algorithm is as follows: 

(a) The dynamic routing and scheduling problem in adversarial networks can 
be (approximately) solved as a sequence of static, adaptive packet routing 
problems; 

(b) Each of these adaptive packet routing problems can be (approximately) 
solved as a (non-adaptive) packet scheduling problem with a small number 
of paths; 

(c) Each of these packet scheduling problems can be (approximately) solved; 
and 

(d) the performance loss in each of these steps is relatively negligible. 

The rest of this section is devoted to showing the details involved in each of 
these steps. 

Reduction to static, adaptive, packet routing. The dynamic routing and 
scheduling problems in adversarial queueing networks can be reduced to a se- 
quence of adaptive packet routing problems by using discrete review policies. In 
any such policy, the system is reviewed at discrete points in time, say, at 

To = ,Ti,T 2, . ■ . ,Ti, Ti+i, .... 
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Policies differ in the way in which the review epochs are picked; we shall not 
expand on this point any further because our algorithm picks these review epochs 
in a natural way, as described below. 

Suppose Ti is a review epoch chosen by our algorithm. At T^, we solve an 
adaptive packet routing problem, with the inputs given by {Aki\Ti-i^Ti)'\. In 
other words, the packets considered by the algorithm at time Ti are precisely 
those that were injected into the network at or after the previous review epoch; 
these are routed to their respective destinations using a “good” adaptive packet 
routing algorithm. The epoch at which all of these packets are routed to their 
destinations defines the next review epoch T^+i. Note that packets that arrived 
at or after Ti are ignored by the adaptive packet routing algorithm until T^+i. 
Clearly, the review epochs chosen by are a function of the adaptive packet routing 
algorithm used; and the effectiveness of such a policy will critically depend on 
how good the adaptive packet routing algorithm actually is. We shall analyze 
this next. 

At the epoch Ti, we shall process all the packets that arrived during the 
interval [Ti_i,Ti). Let Wi be the optimal value of the associated fractional mul- 
ticommodity flow problem. It is clear that every algorithm will require at least 
Wi units of time to process this input; specifically, in the absence of arrivals at 
or after Ti, no algorithm can process all of the input by time t < Ti_i -|- Wi. 

Suppose our adaptive packet routing algorithm is able to route all of these 
packets to their destinations in at most Wi + f steps, for some (constant) / that 
depends only m and n, but not on the input to the packet routing problem. (It is 
important that / be independent of Wi.) Thus, / is a measure of the inefficiency 
of the adaptive packet routing algorithm, and bears directly on the amount of 
“work” seen by the algorithm at the next review epoch. Given this, how large 
can Wi+i be? Clearly, Wi+i represents the maximum load on any arc due to 
arrivals in [Ti, T^+i), which by our assumption is contained in [Ti, Ti + Wi + /). 
Therefore, 



W,^ 



i+l ^ 



< 



(T,+i - T,) 



rw < 





rw 



< r{Ti+i - Ti) + w. 



( 2 ) 



since r < 1. 

A recursive application of Eq. (2) implies 



lim sup Wi < 



f + w 
1 — r 






Thus, letting Q{t) denote the total number of packets in the system at time 
we have 



Q{t) < m lim sup W, < ^ 

1 - r 



( 3 ) 



Thus, the dynamic routing/scheduling problem in an adversarial queueing 
network can be solved as a sequence of static, adaptive packet routing problems, 
as long as each of these problems is solved relatively well; in particular, the 
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queue-length bound of Eq. (3) will hold as long as the makespan of the static, 
adaptive packet routing problem is within an additive constant of the associated 
congestion lower-bound. 

Identifying a small set of “good” paths. Our goal now is to consider a 
static, adaptive packet routing algorithm. Let t be a review epoch, and let Aij 
be the number of packets in the system with origin i and destination j at time 
t. Let Wt be the optimal value of the (fractional) multicommodity flow problem 
(IMF) defined by the packets present in the system at time t, and let (x) be such 
a solution. Note that without loss of generality, we can assume that Aij > 0. 
Given x, we can also assume that there does not exist any cycle with positive 
flow; hence we can decompose the solution (arc-flows) into flows along paths Pk, 
k = 1, . . . , iL, with the (fractional) flow value on path Pk being yp^, and such 
that 

k:{i,j)eE(Pk) u,veV 

and 

E yPk = ^i,3- 

k-.o{Pk)=i,d{Pk)=j 

In the expressions above, o{Pk) and d{Pk) denote the origin and destination 
of path Pk- We refer the reader to Ahuja et al. [1] for a discussion on flow 
decomposition . 

Our task now is to select precisely Aij paths from i to j, without affecting 
the congestion along any arc adversely; in other words, we need to round the 
fractional solution {x) to an integral 0-1 solution in a suitable manner. We do 
this by using the following rounding algorithm of [13]: 

Theorem 1. ([13]) Let A he a real valued si x S 2 matrix, and y be a real- 
valued S 2 -vector. Let b be a real valued vector such that Ay = b and t he a 
positive real number such that, in every column of A, (i) the sum of all the 
positive entries is at most t and (ii) the sum of all the negative entries is at 
least —t. Then we can compute an integral vector y such that for every i, either 
Vi = lVi\ Vi = IVi] ^ where hi — bi < t for all i. Furthermore, if y 

contains d non-zero components, the integral approximation can be obtained in 
time 0(sf lg(l -I- S 2 /S 1 ) -I- sf -I- dfsi -\- siS 2 )- 

To use Theorem 1, we first transform our linear system above to the following 
equivalent form: 

E yp,<Wt\J 

k:{i,3)aE(Pk) 

E {-'^)yPk = -fpAij y i,j G V. 

k:o(Pk)=i,d(Pk)=j 

The set of variables above is {yp^, : fc = 1, . . . , K}. Note that yp,. € [0, 1] for 
all these variables. Furthermore, in this linear system, the positive column sum 
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is bounded by the maximum length of the paths, which in turn is bounded by m, 
the number of arcs in the graph. The negative column sum is also bounded by 
—m. Thus, the parameter t for this linear system, in the notation of Theorem 1, 
can be taken to be m. Hence by Theorem 1, we can obtain in polynomial time 
an integral solution y satisfying 

Vp^ <Wt + m y {i,j) e E{G) 

k:{i,j)GE{Pk) 

'Y < -mAij + 771 y i,j €V. 

k:o{Pk)=i,d{Pk)=j 
For each i,j, we have 



Yh yPk ^ 1 - 

k:o(Pk)=i,d(Pk)=j 

Note the crucial role of the strict inequality. Thus, we have selected at least Aij 
paths from i to j; furthermore, the congestion along every arc is bounded by 
Wt + m. 

To summarize what we have achieved: starting from an arc flow solution, we 
used flow decomposition and an application of the rounding theorem to derive 
an integer solution such that the load on any arc is increased by at most m. 
Each “commodity” (i.e., origin-destination pair) is now routed along at most 
771 paths. We can now reformulate this adaptive packet routing problem as a 
(non-adaptive) packet scheduling problem as follows: think of each path from i 
to j as a type, and assume that y^ packets have to be sent from i to j along path 
Pk- (To avoid cumbersome notation, we have dropped the dependence of y on 
the origin-destination pair.) In essence, we have used the rounding algorithm to 
compute a small set of good paths for the adaptive packet routing problem; we 
now pretend that the problem to be solved is really a packet scheduling problem 
in which an explicit path is associated with each packet; the number of packets to 
be routed along a given path is determined by applying the rounding algorithm 
on an optimal (fractional) multicommodity flow solution. 

Solving the packet scheduling problem. The dynamic routing/scheduling 
problem on an adversarial network is now reduced to a simpler, static, packet 
scheduling problem. For convenience, we describe the input to this packet schedul- 
ing problem slightly differently. The packet scheduling problem consists of K 
types of packets; packets of type k require a path Pk through the network, are 
initially available at o{Pk) £ V, and need to reach d{Pk) £ V; there are Uk 
packets of type k. The objective is to find a schedule for all of these packets that 
minimizes makespan. Each packet requires unit time to traverse an arc; each arc 
can process one packet per unit time. Obviously, this is an iVP-hard problem. 
Fortunately, we do not need to find an optimal schedule; all we need is a schedule 
with makespan within an additive constant of the associated congestion lower 
bound. Note that this additive constant could depend on m, n, K, but cannot 
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depend on ni, ri 2 , . . . , themselves; this is because in the packet scheduling in- 
stances that will arise in the solution of the adversarial network will have m, n, 
and K will be independent of r and w, the parameters of the adversary, whereas 
the Uk will depend on r and w. We briefly outline two solution methods to this 
packet scheduling problem, and specify the corresponding bounds. 

Fluid synchronization algorithm. The packet scheduling problem outlined 
here is a special case of the job shop scheduling problem with the makespan 
objective considered by Bertsimas and Sethuraman [8]. In that work, they con- 
sider a fluid relaxation of the job shop scheduling problem, which can be viewed 
as a continuous analog of the discrete job shop scheduling problem. Using an 
optimal solution to the fluid relaxation, they And nominal start times for each 
packet at each of the arcs it has to visit; these nominal start times are carefully 
constructed in a recursive manner, based on both the optimal fluid solution and 
the partial discrete schedule. 

More precisely, suppose type k packets need to visit arcs ak,i, ak, 2 , ■ ■ ■ , o-k,iu 
in that order. Suppose W is the maximum load on any arc. The scheduling 
algorithm discussed in [8] first determines the fluid start and completion times 
for each packet at each stage. The fluid start time, FSk,j{n), of the type k 
packet at (its) stage j (arc akj) is defined to be (n— l)W/nk; the corresponding 
fluid completion time FCk,j(n) is nW/uk- 

Since the fluid relaxation processes packets continuously, each type k packet is 
processed by all its stages simultaneously at a uniform rate nkjW; for this reason, 
the fluid start and completion times for any packet is independent of its “stage,” 
and depends only on the packet number. In trying to “round” this fluid schedule 
to an implementable discrete schedule, we need to overcome two difficulties: first, 
the fluid relaxation treats packets as continuous entities, with the effect that the 
same packet can be “scheduled” by multiple arcs simultaneously; and second, 
the fluid relaxation allows arcs to split their effort across multiple packet types, 
as long as the overall effort allocated by each arc is at most 1 per unit time. 
In other words, the fluid relaxation views both the packets and the processing 
resources as being infinitely divisible. (The resulting lower bound is naturally 
just the congestion lower bound; the dilation bound does not arise because of 
the continuous nature of the jobs.) 

The fluid start of a given packet at a given stage may be viewed as the ideal 
start time of that packet at that stage, but clearly, this is an unrealistic ideal. 
Motivated by the question of defining a more realistic target start time for each 
packet at each stage, Bertsimas and Sethuraman [8] defined nominal start times; 
these are defined in terms of the fluid start and completion times as well as the 
partial discrete schedule. The nominal start time, NSkj{n), of the n*** type k 
packet at its stage j (arc akj) is defined by 

NSk,i{n) = FSk,i{n), 

NSk.iil) = DSk.i-i{l) + l, 

NSk,i{n) = max I NSk,i{n 



i > 1, 

W 

- 1) + — , DSk,i-i{n) -fl^, n,i>l, 
nk 
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where DSk,i-i{n) is the start time of the n**' type k packet at stage {i — 1) (arc 
Ofc (,;_!)) in the discrete schedule. 

Bertsimas and Sethuraman [8] proposed a simple scheduling rule (called 
“fluid synchronization algorithm”) based on these nominal start times: when- 
ever a node has to make a processing decision, it schedules an available packet 
with the earliest nominal start time. Note that whenever a packet is chosen to 
be scheduled at a certain node, its nominal processing time at its next stage can 
be calculated; so the nominal start times for every packet queued at a node will 
be known. 

The main result of [8] adapted to this special case can be stated as follows: 

Theorem 2. Consider a (non- adaptive) packet scheduling problem with K job 
types and m arcs. Given initially Uk jobs of type k = 1,2,..., it', suppose the 
maximum load on any arc is W , and let W* be the optimal makespan. Then, 
the fluid synchronization algorithm produces a schedule with makespan time Wd 
such that 

W < W* < Wd < W + n{K + 2). (4) 

Sevastyanov’s algorithm. In the mid-seventies, interesting approximation al- 
gorithms were derived for several shop scheduling problems. These algorithms 
were based on beautiful, geometric arguments, and were discovered indepen- 
dently by Belov and Stolin [7], Sevastyanov [14], and Fiala [11]. These methods 
constructed schedules for job shop scheduling problems with an additive error 
term that depended only on the number of machines, and the maximum pro- 
cessing time of a job, but not on the number of jobs. Since it is not central to 
this paper (and in the interest of space), we do not discuss these algorithms in 
detail; we refer the interested reader to the original papers cited earlier as well 
as the excellent survey of Sevastyanov [17]. The strongest of these results, due to 
Sevastyanov [15,16], provides a schedule of length at most (n — l)(mn^-|-2n — 3). 

Remark. Note that depending on K, this may or may not be better than the 
schedule provided by the fluid synchronization algorithm. For the adaptive case, 
it is seen that the guarantee provided by the fluid synchronization algorithm is 
slightly better than the one provided by Sevastyanov’s algorithm. Moreover, the 
fluid-based algorithm is not computationally intensive at all, and is very simple 
to implement. On the other hand, for the non-adaptive case, the adversary may 
insist that the algorithm route packets along exponentially many paths; in this 
case, the guarantee provided by the fluid-based method is unattractive, and 
Sevastyanov’s method is clearly better. 

The main result. Our main result is obtained by putting all of these steps 
together. Fix a review epoch i, with Wi being the work seen by the scheduler 
at this epoch. Then, step 2 results in an instance of the non-adaptive packet 
scheduling problem with maximum congestion at most Wi -f- m; using the fluid 
synchronization algorithm for this packet scheduling problem results in a sched- 
ule with length at most Wi -h m -h n{K 2). Noting that there are at most 
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commodities, and that each of which may use at most m paths, we conclude that 
the schedule computed at epoch i will have length at most Wi + m + n^mn? + 2n. 
Thus, the inefficiency parameter / is at most m+2n+m^n^] using this in Eq. (3), 
we have 



Q{t) < 



m{f + w) m{m + 2n + + w) 



1 — r 



< 



1 — r 



( 5 ) 



where Q{t) represents the number of packets in the system at time t. 

For Sevastyanov’s algorithm a similar guarantee can be shown to hold. We 
omit the details. 

Our results can now be formally stated as the following theorem. 



Theorem 3. Consider an adversarial queueing network under a fractional (w, r) 
adversary. If r < 1, then the discrete review scheduling policy constructed keeps 
the number of packets in the system bounded at all times. In particular, the total 
number of packets in the system at time t, Q(f), satisfies 

m{m + 2n + + w) 

Q[t) < - . 

□ 

An immediate corollary is that for adversarial queueing networks in which 
the arrival rates for packets with origin i and destination j is rij, an algorithm 
for which the number in system is 0{w/{l — r)), can be designed, where r can 
be explicitly computed based on the using a fractional multicommodity flow 
formulation. Gamarnik [12] considered this model and showed that stable policies 
exist for this system if and only if the associated fractional multicommodity flow 
problem has value at most 1. (The r in the expression for the number-in-system 
bound is exactly the optimal solution to this multicommodity flow problem.) 

Since the number in system is relatively small, one can expect the proposed 
algorithm to provide good delay guarantees for all the packets as well. This can 
be formally established using the fact than any packet stays in the system for at 
most two review periods. Discussion on this topic is deferred to the full version 
of this paper, as is the discussion of results on the non-adaptive version of the 
problem. At this point, we simply note that these techniques lead to excellent 
performance guarantees for the non-adaptive version of the problem as well. 



Future work. Several outstanding questions remain; we point out two explic- 
itly. First, we hope to consider the case r = 1; this seems difficult to understand, 
and may in fact exhibit different behavior depending on whether the adversary 
is fractional {w, r) or integer {w, r) restricted. Moreover, the algorithm we pro- 
pose is (semi) centralized, although the queue-length information is used only at 
the discrete review epochs. In contrast, Aiello et al. [2] proposed a distributed 
algorithm for the integer (w, r) adversary. It will be interesting to design a dis- 
tributed algorithm for the problem considered here. We hope to address this in 
future work as well. 
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Abstract. Suppose that we have a set of items and a set of devices, each 
possessing two limited resources. Each item requires a given amount of 
the resources. Further, each item is associated with a profit and a color, 
and items of the same color can share the use of one resource. We need 
to allocate the resources to the most profitable (feasible) subset of items. 
In alternative formulation, we need to pack the most profitable subset of 
items in a set of 2-dimensional bins (knapsacks), in which the capacity 
in one dimension is sharable. Indeed, the special case where we have a 
single item in each color is the well-known 2 -dimensional vector packing 
(2DVP) problem. Thus, the problem that we study is strongly NP-hard 
for a single bin, and MAX-SNP hard for multiple bins. Our problem 
has several important applications, including data placement on disks in 
media-on-demand systems. 

We present approximation algorithms as well as optimal solutions for 
some instances. In some cases, our results are similar to the best known 
results for 2DVP. Specifically, for a single knapsack, we show that our 
problem is solvable in pseudo-polynomial time and develop a polynomial 
time approximation scheme (PTAS) for general instances. For a natural 
subclass of instances we obtain a simpler scheme. This yields the first 
combinatorial PTAS for a non-trivial subclass of instances for 2DVP. 
For multiple knapsacks, we develop a PTAS for a subclass of instances 
arising in the data placement problem. Finally, we show that when the 
number of distinct colors in the instance is fixed, our problem admits a 
PTAS, even if the items have arbitrary sizes and prohts, and the bins 
are arbitrary. 



1 Introduction 

Consider the following optimization problem. Suppose that we have a set of n 
items and a set of N devices, each possessing a limited supply of two resources. 
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Each item requires given amounts of the resources. Further, an item is associated 
with a profit, that is obtained if the resources are allocated to that item, and 
a color; items of the same color can share the use of one of the resources. The 
goal is to allocate the resources to a subset of the items, subject to availability 
constraints, such that the overall profit is maximized. 

Formally, suppose that the j-th device, I < j < N, has Vj and Cj units from 
the first and second resource, respectively. Each item 1 < i < n, is associated 
with a profit, pi. Also, item i requires Si units from the first resource and Ci units 
from the second resource. We assume that the second resource can be shared by 
some items. Specifically, the instance / is partitioned into M sets, by colors; all 
items of the same color fc, 1 < fc < M, require the same amount, Ck ,from the 
second resource, and can share its use. The goal is to select a feasible most- 
profitable subset of items. A subset is feasible if the total allocation from the 
first (second) resource on the j-th device does not exceed Vj (Cj), for 1 < j < iV. 

In alternative formulation, the above set of items needs to be packed into N 
bins (knapsacks); the j-th bin has capacity Vj and Cj compartments. Each item 
can be packed in any of the bins. When the first item of color k is packed in 
some bin, Ck compartments are allocated to this color, additional items of color 
k will be accommodated in the same set of compartments. A packing is feasible 
if the total size of the packed items in any bin, j, is at most Vj, and the total 
number of compartments allocated in bin j is at most Cj, 1 < j < N . The goal 
is to pack a subset of the items of maximum total profit. Indeed, the special 
case where we have a single item in each color is the well-known 2-dimensional 
vector packing problem (2DVP). Thus, our problem is strongly NP-hard for a 
single bin [10] and MAX-SNP hard for multiple bins, already in the case where 
the bins are identical, and the items have unit profits, i.e., pi = 1 Vi, 1 < i < n 
[16]. We call this problem vector packing with a shareable dimension (VPSD). 

An important application of VPSD is data placement on disks in media-on- 
demand systems [6,12,9]. In such systems (see, e.g., [17,7]), a large database of 
M video program files is stored on a centralized server. Each program file, k, 
I < k < M, is associated with a number of desired broadcasts of this file, Uk, 
and a size (storage requirement), Ck- The files are stored on N shared disks. Each 
disk, j, is characterized by (i) its storage capacity, Cj, that is, the total size of 
the files that can reside on it, and (ii) its load capacity, Vj-, which is the number 
of data streams that can be read simultaneously from that disk. The files need 
to be placed on the disks so as to maximize the total number of requests for 
broadcasts that can be satisfied simultaneously. In the resulting instance of the 
VPSD problem, the bins represent disks, and the items are broadcast requests. 
To satisfy a request, some disk has to broadcast a data stream to the client. 
This disk must hold a copy of the requested file. Note that storage is a shared 
resource — that can be used by all the streams broadcasting the same data 
from the same disk. Different files may have different sizes, thus, we may have 
different Cj values. On the other hand, all the broadcast streams require the 
same (non-shareable) bandwidth; thus, we have Vi = 1. 
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Other applications of VPSD are production planning and scheduling parallel 
tasks (see in [13] ) . Of particular interest in our study is the subclass of uniform 
profit/size ratio instances of VPSD, in which for some a > 0, Wi pi = asi. Such 
instances naturally arise in real systems, where client payments for service (i.e., 
item profits) are proportional to the amounts of resources consumed (i.e., the 
items sizes). 



1.1 Our Results 

In the following we summarize our main results. 

Single knapsack We show (in Section 2) that VPSD can be optimally solved 
in pseudo-polynomial time. We then develop an LP based PTAS for general 
instances. For the subclass of uniform profit/size ratio instances, we develop 
(in Section 3) a simpler approximation scheme, that is based on extension of a 
PTAS proposed in [11] for the classical knapsack problem. By this, we obtain 
the first combinatorial PTAS for a non-trivial subclass of instances for 2DVP. In 
Section 4, we develop fully polynomial time approximation schemes (FPTAS) for 
the subclasses of (f) data placement instances, and (m) instances with constant 
number of compartment requirements. 

Multiple knapsacks We show (in Section 4) that an iterative greedy algorithm 
achieves the ratio of (2 -|- e) for instances with arbitrary bin sizes, and (^^ + 
e) when the bins are identical. A PTAS is developed (in Section 5) for data 
placement instances in which the disks are identical (but may have arbitrary 
storage and load capacities), and the number of distinct file sizes is fixed. Finally, 
for instances in which M, the distinct number of colors, is fixed, we show (in 
section 6) that VPSD admits a PTAS, even if the items have arbitrary sizes and 
profits, and the bins are arbitrary. 

In our PTAS for a single knapsack (in Section 2), we combine the guessing 
technique of [2] with a novel application of the approximation scheme of [3] to 
the multidimensional multiple choice knapsack problem. In our algorithms for 
uniform ratio instances (in Section 3), we show that a simple greedy algorithm 
and an approximation scheme proposed for the 0/1 knapsack problem can be 
extended to VPSD. The idea is to partially reduce VPSD to the knapsack prob- 
lem, by first considering all the items of each color as a single item. Later, we 
map the grouped items back to the original items. While these extensions do not 
apply for general instances, it may be possible to apply similar ideas for other 
subclasses of VPSD and 2D VP. 

Due to space constraints we state some of the results without proofs.^ 



1.2 Related Work 

Packing problems in single dimension have been extensively studied. Since these 
problems are NP-hard, most of the research work in this area focused on finding 



^ The detailed proofs are given in [14]. 
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approximation algorithms. The classic 0-1 knapsack problem admits an FPTAS; 
that is, for any e > 0, a (1 — £r)-approximation for the optimal solution can be 
found in Oinje^') steps [8,5]. In contrast, the multiple knapsack (MK) problem is 
known to be strongly NP-hard [4] . Chekuri and Khanna developed in [2] a PTAS 
for MK and showed that with slight generalizations this problem becomes APX- 
hard. 

Packing problems in higher dimensions (also known as d-dimensional vec- 
tor packing) are known to be substantially harder to solve, exactly or approxi- 
mately. The best known result for a single knapsack is a PTAS due to Frieze and 
Clarke [3], for the case where d is a fixed constant. As opposed to the combina- 
torial schemes for the single dimension case, the PTAS in [3] uses as a procedure 
a linear program. To the best of our knowledge, none of the later published work 
on the d-dimensional knapsack problem gives a combinatorial scheme, even for 
the case where d = 2. For the case of > 1 bins, Woeginger showed in [16] 
that 2-dimensional vector packing is MAX-SNP hard (see also in [1]). Chekuri 
and Khanna presented in [1] a PTAS for the vector scheduling problem, in which 
our goal is to schedule a set of jobs, given by d-dimensional vectors, on a set of 
machines, so as to minimize the maximum completion time (or makespan) over 
all dimensions. The scheme in [1] yields a dual PTAS for d-dimensional vector 
packing in > 1 bins, where the bins have d equal-sized dimensions, and d 
is a fixed constant. The class constrained multiple knapsack (CCMK) problem 
introduced in [13] is a special case of VPSD, where Cfe = 1 for all 1 < fc < M. 
The paper [13] presents a PTAS for any instance of CCMK in which M, the 
number of distinct colors of items, is fixed. 

The data placement problem was initially studied in [12]. The paper presents 
an algorithm for the case where all the files are of the same (unit) size, and for 
all 1 < J < A^, we have the same ratio Vj/Cj for disk j {uniform ratio disks). 
The paper shows that the algorithm achieves a ratio ofl — 1/(1-|- Cmin) to the 
optimal, where Cmin = minj Cj . Golubchik et al. gave in [6] a tighter analysis 
of this algorithm and showed that it achieves the ratio 1 — 1/(1 -I- VCmin)"^, 
and that this ratio is optimal for any algorithm for this problem. The paper [6] 
also presents a PTAS for the data placement problem with unit sized files and 
uniform ratio disks. Recently, Kashyap and Khuller [9] studied the problem with 
files of A distinct sizes, where A is fixed. They presented an algorithm that 

achieves a ratio of ^1 — 1/(1 -I- \J ~^)^)^ , where file sizes are in {1, . . . , Z\}, 
and C is the storage capacity of the disks. They also showed that this algorithm 
can be combined with an algorithm that runs in polynomial time when C is 
fixed, to get a PTAS for the data placement problem with constant number of 
file sizes. 

2 Approximation Scheme for a Single Bin 

In this section, we discuss the single knapsack version of VPSD. Assume that the 
knapsack has the volume V and C compartments. We first note that by using 
a two-level dynamic programming algorithm, VPSD with a single knapsack can 
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be solved optimally in pseudo-polynomial time. (The details are given in [14].) 
Let P be an upper bound on the total profit (indeed, P < 

Theorem 1. VPSD can he solved optimally in 0{nP + MP'^C) steps. 

We now describe a PTAS for a single knapsack. Note that, by the result 
of [10], this is the best we can expect, since our problem is strongly NP-hard. 
Assume that we know the optimal profit, P, for our instance. We reduce our prob- 
lem to the binary 2-dimensional multiple choice knapsack (B2D-MCK) problem. 
That is, for given values of P and e, we define an instance for B2D-MCK, whose 
optimal solution induces a solution for VPSD with profit at least (1 — e)P. We 
then develop a PTAS for the B2D-MCK problem. By combining the reduction 
and the PTAS for B2D-MCK, we get a PTAS for VPSD. Note that P can be 
‘guessed’ in polynomial time within factor (1 -|- e), using binary search over the 
range (m.a,XiPi,J2iPi)- 



Reduction to the B2D-MCK Problem Recall that an instance of B2D- 
MCK consists of a single 2-dimensional knapsack and M sets of items. Each 
item has a 2-dimensional size and is associated with a profit. We need to pack a 
subset of items of maximal total profit. A packing is feasible if it does not exceed 
the volume in any dimension, and at most one item is packed from each set. 

Given the value of P, the parameter e and a VPSD instance with n items of 
M distinct colors, we construct a B2D-MCK instance which consists of a single 
2-dimensional knapsack with capacities b\ = V and 62 = C, and M sets of items; 
each set Sk has R = M/e items, 1 < fc < M. Each of the items in Sk represents a 
subset of the items in the VPSD instance, which are of color fc, and whose total 
profit is rounded down to the next integral multiple of eP/M. In particular, 
the jth item in Sk, denoted as (k,j), is given by the triple {skj,Ck,p{k,j))\ 
Skj is the minimal total size of a subset of items in color fc, whose total profit is 
p{^,j) = {jsP) /^ ■ This total size can be computed using dynamic programming 
for the items of Sk with the rounded profits (as in the FPTAS for the classic 
knapsack problem [8]). 

Lemma 1. If there exists a solution with profit P for the VPSD instance, then 
there exists a solution with profit at least (1 — e)P for the binary 2D-MCK 
instance. 



Approximating the Optimal Solution for B2D-MCK Given an instance of 
B2D-MGK, we ‘guess’ the set S of most profitable items in the optimal solution, 
where [/S’] = = min(M, J ). Let E{S) be the subset of items with profits 

that are larger than the minimal profit of any item in S, that is, E{S) = {(fc, j) ^ 
S I p{k,j) > PminiS)}, where Pmin{S) = Tam(k,j)eSP{k,j). 

We pack all the items (fc, j) € S and eliminate from the instance all the items 
(fc, j) € E{S), and the sets Sk from which an item has been selected. In the next 
step we find an optimal basic solution for the following linear program, LP{S) 
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M R 

k=l j=l 
R 

s.t. ^ Xkj < 1 for fc = 1, . . . , M 
j=i 
M R 

EE ^kj^kj — ^ 

k^l j^l 
M R 

^ ^ Ck ^ ^ ^kj E C 
k^l j=l 

Xkj = 1 for (k^j) G 5, and Xkj = 0 for (fc, j) € E{S) 

Xfej € {0,1} for fc= j = l,...,R; {kJ)^SUE(S) 

In the linear programming relaxation we allow 0 < Xkj < 1- Given an opti- 
mal fractional solution, we get an integral solution by rounding down to 0 the 
fractional variables in the solution. The output for B2D-MCK consists of the 
items in S and the items (k,j) for which Xkj = 1- 

Theorem 2. The above seheme aehieves a ratio of (1 — e) to the optimal B2D- 
MCK profit. 

Proof. Let x* be an optimal solution for the linear program LP(S'), and let S* be 
the corresponding subset of items, that is, S* = |(fc,})| xlj = 1}. If l^*] < h then 
we are done (the scheme outputs a (1 — e)-approximation to the optimal profit: 
this is due to the initial guess of P); otherwise, let S* = {(fci, ji), ■ ■ ■ , (fcr, jr)|, 
such that p{ki,ji) > • • • > p{kr,jr). Let = {(fci, Ji), ■ • ■ , {kh,jh)}, and a = 
J2e=iP{ke,je). Then, for any item {k,j) ^ {S^ U E{Sp), we have p{k,j) < a/h. 
Let z* ,z denote the optimal solution and the solution output by the scheme, 
respectively. We denote by x^(5'^), x^(5'^) the basic and integral solutions of 
LP(S) as computed by the scheme, for the initial guess Sf. Now, we have that 

MR MR 

- E E i^h) ^ E E i^h) + <5, 

k—1 j—1 k—1 j—1 

where 6 = j^^pP{k, j), and E is the set of items for which the basic variable 
was a fraction, that is, E = {{k,j)\ > x{^(5'^)}. 

Recall that in any basic solution for a linear program, the number of non- 
zero variables is bounded by the number of tight constraints in some optimal 
solution (since non-tight constraints can be omitted). Assume that in the optimal 
(fractional) solution of LP(S’f) there are L tight constraints, where 0 < L < 
M + 2. Then in the basic solution x^(S'^), at most L variables can be strictly 
positive. Thus, at least L — 4 variables get an integral value (i.e. ‘I’), and |F| < 4. 
Note that <5 < a/h, since Fn (S'^U£’(S'^)) = 0. Hence, we get that z* < z+^ < 
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3 Instances with Uniform Profit/Size Ratio 

In this section we present algorithms for instances with uniform profit/size ratio, 
that is, for some a > 0, = asi. Our goal is to pack in a single bin a subset 

of the items whose total size is as large as possible. W.l.o.g. we assume that Vfc, 
Ck < C, Vz, Si < V. 

3.1 A Greedy 2- Approximation Algorithm 

Let Sk be the total size of items with color k, 1 < k < M. Consider the following 
greedy algorithm Ac-- 

1. Sort the colors such that S'l/ci > S^jc^ > ■ • ■ > Sm /cm- 

2. Find in the sorted list the first j colors satisfying X]'fc=i < C, and 

Cfe > C. Let A be the set of all items in the selected colors. 

3. Pack the items in A in the knapsack from largest to smallest, ignoring colors, 
biggest to smallest, while there is enough space. Let oi denote the total size 
of items packed this way. 

4. Let k* be the color with maximal total size. Let 02 be the total size of items 
that are packed from color k* when adding items greedily, from largest to 
smallest, as long as there is enough space. 

5. Select (and pack accordingly) the maximum between oi and 02. 



Theorem 3. Ac yields a 2 -approximation for uniform-ratio instances. 

Proof. If the total size of items in color k* is more than V, then 02 > P/2, 
otherwise, 02 = Sk*. If 02 > P/2, we are done (since OPT < V). Consider the 
case that 02 = Sk* . Since we sort the colors by profit/compartment ratio, OPT < 
S'! -I- ... + S'j-i-i. If in step 3 we pack all the items of A, then oi = -I- . . . + 

Alg = max{ai,a 2 ) = max{S\ ... Sj,Sk*) > ^(■S'l Sj Sk*) > 

^(S'l -I- ... -I- S'j-i-i) > ^OPT. If in step 3 we pack only part of the items, then 
since we pack from largest to smallest we fill at least half of the volume, which 
is at least ^OPT. 



3.2 Approximation Scheme 

We now describe a PTAS for the uniform ratio case. Our scheme extends the 
PTAS of Salmi [11] for the classical knapsack problem. Let k\,k2 be constants 
(to be determined). Algorithm A proceeds as follows. For any possible selection 
of at most ki items from I, and for any possible selection of at most /c2 colors 
among those that do not appear in the ki items, we do the following. 

1. Let V be the remaining volume (P' equals P minus the total size of the 
k\ items). Let C be the remaining number of compartments (O' equals C 
minus the total compartment demand of the k2 colors and the k\ items). 
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2. If this selection of items and colors is infeasible (that is, V' < 0 or C" < 0) 
stop; otherwise, 

3. Let T be the set of the ^2 selected colors and the colors of the fci items. 

(a) Pack the ki items. 

(b) Add the other items of the T colors in arbitrary order as long as there 
is enough space. 

(c) If there is no space while adding these items, terminate with the packed 
items; otherwise, 

(d) Sort the colors that do not belong to T such that Si/ci > S 2 /C 2 > ■ ■ ■ 

(e) Add items of color ci in arbitrary order, then items of color C 2 and so 
on, as long as there are enough space and enough compartments. 

Theorem 4. For all ki,k 2 , A has approximation ratio R_a < 1 + o,nd 

running time • M^^). 

By selecting fci = /c 2 = 1/e we obtain a PTAS. Consider the subclass of 2D VP 
instances in which the size of any item i in each dimension is arbitrary, and the 
profit Pi is proportional to the size in one dimension. For such instances, we have 
a combinatorial approximation scheme, as formalized in the next result. 

Corollary 1. Algorithm A is a PTAS for uniform profit/size ratio instances of 
2DVP. 

4 Better Algorithms for Special Instances 

We now show that better approximations or more efficient algorithms can be 
obtained for several subclasses of instances. 

Theorem 5. If the compartment requirement of any color class can take one of 
the values ni, . . . where w is fixed, then an optimal solution can be computed 
^n sleps. 

By scaling the item profits, using the upper bound P on the total profit, we may 
lose only a factor of e in the approximation ratio. 

Theorem 6. There is an FPTAS for VPSD instances with a single knapsack, 
in which the compartment requirement of any color class can take one of the 
values rji, ... ,r]uj, and w is fixed. The running time of the scheme is 0{nlogn + 

e ^ ' 

For instances where all items have the same (unit) size and profit, and each 
color can have an arbitrary compartment requirement, we get an exact polyno- 
mial time algorithm. 

Theorem 7. If\/l<i<nsi=pi = l,an optimal solution can he computed 
in 0{Mn^) steps. 
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Recall that a data placement instance is given as M files, each having a specified 
size and a broadcast requirement, which takes a value in (0,R]. Thus, for such 
instances the optimal algorithm runs in 0{MV^) steps. By scaling and rounding 
the load capacity of the disk, as well as the broadcast requirements of the files, 
we obtain an FPTAS. 

4.1 Packing in Multiple Bins: A Greedy Algorithm 

Given an instance of VPSD with N bins, consider an algorithm, Ac, which packs 
the bins sequentially. In step j, 1 < j < iV, we use an (approximation or exact) 
polynomial time algorithm for packing a ‘good’ subset of the remaining items in 
bin j. By the analysis of this iterative packing algorithm, as given in [13], and 
by the above results for a single knapsack, we get 

Theorem 8. Ac is a 2 + e- approximation algorithm for VPSD, and 2-approx- 
imation for instances with unit sizes and profits. 

In the special case where all the bins are identical, we can use a result for the 
generalized assignment problem (GAP) in [2] to get better approximation ratios. 



Theorem 9. Ac achieves a ratio of e/{e — l)-\-e for VPSD with identical bins, 
and the ratio ej{e— 1) for instances with unit sizes and profits, where e is the 
base of the natural logarithm. 



5 Approximation Scheme for Data Placement 

In this section, we develop a PTAS for data placement instances in which the 
disks are identical, but may have arbitrary storage and load capacities, and the 
number of distinct file sizes is fixed. 

In terms of the VPSD problem, we consider instances I consisting of n items, 
of M distinct colors; for any item i,l < i < n, pi = Si = 1. The compartment 
requirement of color k can take one of w possible values, rji, . . . ,rj^, where w > 1 
is a fixed constant. There are items of color fc, 1 < fc < M. We need to pack 
a subset of the items in N identical bins, where each bin has volume V and C 
compartments . 

Given a parameter e: > 0, the scheme proceeds as follows, (i) Guess the opti- 
mal profit from the packing, P, to within factor 1 -|- e. (Recall that 1 < P < n.) 
(ii) Guess the subset of items that are packed in the bins. (Hi) Pack the selected 
items, distinguishing between items with ‘large’ and ‘small’ compartment re- 
quirements. In the latter case, we further distinguish between packings of ‘large’ 
and ‘small’ blocks (We define a block below). 



Similar to the proof of Theorem 6. We omit the details. 
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5.1 Guessing the Packed Items 

Given a correct guess of P, we omit from the input the items in any color k such 
that Uk < eP/M. By that we lose at most a factor of e in the approximation 
ratio. Dividing the value of n^, for each of the remaining colors, by eP/M, and 
rounding down to the nearest integral power of (1 + e), we get an instance in 
which there are h = 0(ln(M/e)) distinct Uk values. 

Now, we partition the item sets to w groups. Si, . . . , S^j, by their compart- 
ment requirements. The item sets having the compartment requirement r]i form 
the Gth compartment category, Si. We find the subset of packed items by guessing 
the contribution of each compartment category to the overall profit. Let P{Si) 
denote the contribution of Si. We may assume that P{Si) > ePjw. By that, we 
lose at most a factor of e from the overall profit. Then, we look for a vector k, of 
integers, w/e < ki < w/e^; ki reflects the contribution of Si to the overall profit 
in some optimal packing, in multiples of 2;^, i.e., ki^ < P{Si) < {ki + 

We seek a vector k = {ki, . . . ,k^) satisfying Y^'^=i ki < ^. The number of such 

vectors is at most which is a constant. Given the contribution of Si, 

we select from Si the minimum number of items that provide this profit. That 
is, we order the sets of items in Si in non-increasing order by sizes, and select 
sets of items, starting from the largest set, until we get the desired profit. Note 
that (at most) one set in Si may be ‘partially selected’, i.e., we pack in the bins 
only some of the items in this set. 

5.2 Packing the Items 

In packing the selected items, we choose in each step a subset of the items (or, 
block) in some color. We distinguish between the sizes of the packed blocks, and 
the compartment requirements of the corresponding items. We say that a block 
is large if its size is at least eV; otherwise, it is small. Also, the compartment 
requirement of color k is large if Ck > sC; otherwise it is small. 

Items with Large Compartment Requirements We first pack blocks of 
items with large compartment requirements. Note that we can pack at most 1/e 
such blocks in each bin. We increase the volume of the bins to T^(l -I- e), and 
pack in each bin blocks in the sizes e^V, . . . ,V. Thus, we round up the sizes of 
small blocks to e^V. After packing the items, we eliminate extra volume, until 
we get that the sum of packed items in each bin is at most V. By that we lose 
at most a factor of e in the approximation ratio. To obtain a constant number 
of possible block sizes, we modify the input, so that no set of items is too large. 
This can harm the approximation ratio at most by factor of e, as formalized in 
the next lemma. ^ 

Lemma 2. The input I can he transformed to I' , which satisfies (i) the size of 
any set of items is at most V/e; {ii) any packing of items in I' can he mapped 



® See also in [6]. We omit the proof. 
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to a packing of items in I of the same profit. (Hi) OPT(I') > (1 — e)OPT(I), 
where OPT{I) is the optimal profit from packing I . 

We now describe how we pack the set of items with large compartment require- 
ments. Partition the item sets to the groups . . . , all of the item sets in 

I < r < h, are of the same size. We call P the r-th profit category. Initially, 
we ‘guess’ the partition of item sets to blocks. Each partition gives the number 
of blocks in the sizes ..., V, taken from the profit categories e ^, ..., V je. As- 
suming that block sizes are in multiples of e^V , and the sizes of the item sets 
are given as integral powers of (1 -I- e), we get that the number of coordinates in 
each partition vector is 0(lg(l/£:^)/e^), which is a constant. 

Note that in the above partition we define only the sizes of the blocks taken 
from each profit category P , 1 < r < h; however, as the item sets in P may be 
of different compartment requirements and different colors, for each of the blocks 
we need to decide also on the item set in P from which it is extracted. This 
can be done greedily, without harming the approximation ratio of the scheme. 
Specifically, given the block configuration for P , we sort the item sets in P in 
non-decreasing order by their compartment requirements. The blocks are sorted 
in non-decreasing order by sizes. We now extract the blocks in the list from the 
sets, starting from the first item set. Once all of the items in the first set are 
allocated to blocks, we proceed to allocate blocks from the second set, and so on. 
In this process we allocate ‘small’ blocks from sets that have ‘small’ compart- 
ment requirement. Clearly, this decreases the potential number of compartments 
required for packing the item blocks. Now, we have a set of item blocks of given 
colors that need to be packed in the bins. 

Given a correct guess of the partition of the item sets to blocks, we may 
now assume that each block is packed in a distinct set of compartments in some 
bin. Hence, we now have an instance of the 2DVP with fixed number of distinct 
item sizes in each dimension. By defining a bin configuration to be the number 
of items in each size (in both dimensions) in a bin, and guessing the number of 
bins having each configuration, we can find the optimal packing of the blocks in 
the bins. 

Large Blocks with Small Compartment Requirements In this step we 
pack large blocks of items whose compartment requirement is at most eC. Since 
at most 1/e large blocks can be packed in a bin, we look at items in the profit 
categories eV, . . . ,Vfe. We guess as before the blocks extracted from each of the 
item sets in the small compartment categories. Then, we can find in polynomial 
time an optimal packing of these blocks in the bins, using bin configurations. 

Packing the Remaining Items Finally, we pack small blocks of items with 
small compartment requirements, by using linear programming and rounding 
the (fractional) solution. The program takes as input the set of small blocks 
generated from the remaining sets of items. We need to allocate for each block 
of items of color k a set of Ck compartments in some bin. This is done by reducing 
our problem to GAP and using a technique of [15] for solving the GAP problem 
(see the details in [14]). 
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We summarize our discussioir iir the following result. 

Theorem 10. The above scheme yields a (1 + e)- approximation for VPSD in- 
stances with unit sizes and profits, and w compartment categories. The running 
time of the scheme is ■ ^^)- 

Thus,, we have a PTAS for the data placement problem, with N identical disks 
and w distinct file sizes, with running time as given in Theorem 10. 

Finally, our scheme can be extended to apply to VPSD instances in which 
the item sizes take at most t distinct sizes, where t is fixed. 

6 Approximation Scheme for Fixed Number of Colors 

For instances in which the distiirct number of colors is fixed, we show that VPSD 
admits a PTAS, even if the items have arbitrary sizes and profits, and the bins are 
arbitrary. Our scheme builds on an approximation technique presented in [13] 
for CCMK. However, since the scheme in [13] uses heavily the fact that the 
number of compartmeirts in each bin can be bounded by some constant < M 
(since the compartmeirt requiremeirt of airy color fc is Cfc = 1), we need to use a 
different approach. Our technique relies on a partition of the bins to 0{\og{N / e)) 
types, by rounding up the volumes, and by eliminating compartments, such that 
in the resulting instance all the bins of the same type have (almost) the same 
volume and the same number of compartments. This is done without harming 
the approximation ratio of the scheme. 

Theorem 11. There is a PTAS for VPSD instances in which M is a fixed 
constant. 

Acknowledgments. We thank Chandra Chekuri for valuable discussions and 
suggestions. 
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Abstract. We consider the problem of computing the outer-radii of 
point sets. In this problem, we are given integers n, d, k where k < d, 
and a set P of n points in R'^. The goal is to compute the outer k- 
radius of P, denoted by Rk{P), which is the minimum, over all (d — k)- 
dimensional flats F, of maxpgp d(p, P), where d{p,F) is the Euclidean 
distance between the point p and flat P. Computing the radii of point sets 
is a fundamental problem in computational convexity with significantly 
many applications. The problem admits a polynomial time algorithm 
when the dimension d is constant [9]. Here we are interested in the general 
case when the dimension d is not fixed and can be as large as n, where 
the problem becomes NP-hard even for k = 1. 

It has been known that Rk{P) can be approximated in polynomial time 
by a factor of (1 -l-e), for any e > 0, when d — k is a, hxed constant [15,2]. 
A factor of 0{\/log n) approximation for Pi(P), the width of the point 
set P, is implied from the results of Nemirovskii et al. [19] and Nesterov 
[18]. The first approximation algorithm for general k has been proposed 
by Varadarajan, Venkatesh and Zhang [20]. Their algorithm is based 
on semidefinite programming relaxation and the Johnson-Lindenstrauss 
lemma, and it has a performance guarantee of 0(\/log n ■ log d). 

In this paper, we show that Rk{P) can be approximated by a ratio of 
0(Vlog n) for any 1 < k < d and thereby improve the ratio of [20] 
by a factor of 0(Vlog d) that could be as large as O{y/\ogn). This ra- 
tio also matches the previously best known ratio for approximating the 
special case Pi(P), the width of point set P. Our algorithm is based 
on semidefinite programming relaxation with a new mixed deterministic 
and randomized rounding procedure. 



1 Introduction 

Computing the outer fc-radius of a point set is a fundamental problem in compu- 
tational convexity with applications in global optimization, data mining, statis- 
tics and clustering, and has received considerable attention in the computa- 
tional geometry literature [12,13,15]. In this problem, we are given integers 
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n, d, k where k < d, and a set P of n points in P'^. The goal is to compute the 
outer k-radius of P, denoted by Rk{P), which is the minimum, over all (d— k)- 
dimensional flats P, of maxpgp d(p, P), where d{p,F) is the Euclidean distance 
between the point p and flat F. A {d— fc)-flat is simply an afflne subspace of R‘^ 
of dimension k. (See Section 2 for a more precise definition of Pfc(P)). Roughly 
speaking, the outer fc-radius Rk{P) measures how well the point set P can be 
approximated by an affine subspace of dimension d — k. A few special cases 
of Rk{P) which received particular attentions includes: Pi(P), the width of P; 
Rd{P), the radius of the minimum enclosing ball of P; and Rd-i{P), the radius 
of the minimum enclosing cylinder of P. 

When the dimension d is a fixed constant, Rk{P) can be computed exactly 
in polynomial time [9]. It is also known that Rk{P) can be approximated by a 
factor of (1 + e), for any e > 0, in 0{n + time [1,14]. In this paper, 

we are interested in the general scenario when the dimensions k and d are not 
fixed and d can be as large as n. 

When the dimensions k and d are part of the input, the complexity of com- 
puting Rk{P) depends on whether d — k is constant or not. It is well-known 
that the problem is polynomial time solvable when d — fc = 0, i.e., the minimum 
enclosing ball of a set of points can be computed in polynomial time (Gritzmann 
and Klee [12]). To the best of our knowledge, whether the problem is NP-hard or 
not is still open when d — k = 1. However, Badoiu et al. [2] show that Rd-i{P) 
can be approximated in polynomial time by a factor of (1 -I- e:), for any e > 0. 
Har-Peled and Varadarajan [15,16] generalize the result and show that Pfc(P) 
can be approximated by a factor of (1 -I- e) for any e > 0 when d — k is constant. 

More hardness results are known when d — k becomes large or k becomes 
small. Bodlaender et al. [4] show that the problem is NP-hard when k = 1. This 
is true even for the case n = 2d ( [12]). Gritzmann and Klee [12] also show that 
it is NP-hard to compute Rk{P) ii k < c ■ d, for any fixed 0 < c < 1. These 
negative results are further improved by Brieden et al. [5] and Brieden [8], the 
latter of which has shown that it is NP-hard to approximate Ri{P), the width of 
a point set, to within any constant factor. Furthermore, Varadarajan, Venkatesh 
and Zhang [20] show that there exists some constant <5 > 0 such that for any 
0 < £ < 1, there is no quasi-polynomial time algorithm that approximates Rk{P) 
within (logn)-^ for k<d-d^ unless NP C DTIME [2(>°s™)°'"’]. 

On the positive side, the algorithms of Nemirovskii et al. [18] and Nes- 
terov [19] imply that Ri{P), the width of point set P, can be approximated 
within a factor of 0{\/log n). Another algorithm for approximating the width of 
a point set is given by Brieden et al. [6,7] and their algorithm has a performance 
guarantee -\/d7lo§^ that is measured in the dimension d. The first approxima- 
tion algorithm for general k have been proposed by Varadarajan et al [20]. Their 
algorithm is based on semidefinite programming relaxation and the Johnson- 
Lindenstrauss lemma, and has a performance factor of 0{^/log n ■ log d). 

Above mentioned results on computing Rk{P) would give us a projection 
that the problem is harder when d — k becomes larger or k becomes smaller. 
However, the result of Varadarajan et al [20] does not confirm this trend, since 
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we have already known that Ri{P) can be approximated by a factor of O(Vlogn) 
while, for general k, the ratio proved in [20] is 0(vT^)g'?r~To^), which is greater 
than 0{\/logn). Therefore, they have conjectured that the factor of O(Vlogn) 
applies to general k as well. 

The main result of the present paper is to show that Rk{P) can indeed 
be approximated by the factor of 0{\/\og n) for all 1 < fc < d, and thereby 
proves their conjecture. Note that the new approximation ratio is reduced by 
a factor of 0{y/logd), which could be as large as 0(>/log n). Our algorithm 
is based on semidefinite programming relaxation with a mixed deterministic 
and randomized rounding procedure, in contrast to all other purely randomized 
rounding procedures used for semidefinite programming approximation. 

2 Preliminaries and SDP Relaxation 

Generally speaking, the problem of computing Rk{P) can be formulated as 
a quadratic minimization problem. Semidfinite programming (SDP) problems, 
where the unknowns are represented by positive semidefinite matrices, have re- 
cently been developed for approximating such problems; see, for example, Goe- 
mans and Williamson [10]. In the case of fc = 1, computing i?i(P) corresponds 
to a SDP problem plus an additional requirement that the rank of the unknown 
matrix equals 1. Removing the rank requirement, the SDP problem becomes a 
relaxation of the original problem and polynomially solvable for any given ac- 
curacy. Once obtaining an optimal solution, say X, of the SDP relaxation, one 
would like to generate a rank-1 matrix, say X = , from X, where y is a col- 

umn vector and serves as a solution to the original problem. Such rank reduction 
is called “rounding”, and many rounding procedures are proposed and almost 
all of them are randomized, see, for example, [3]. 

One particular procedure has been proposed by Nemirovskii et al. [18] which 
can be used for approximating R\ (P) . Their procedure is a simple randomized 
rounding that can described as follows: an optimal solution X of the SDP re- 
laxation, whose rank could be as large as d, can be represented as (e.g., by 
eigenvector decomposition) 



Then one can generate a single vector y by taking a random linear combination of 
the vectors VA 2 'C 2 , • • • , where the coefficients of the combination 

takes values of —1 or 1 uniformly and independently. 

When k > 2, one needs to generate k rank-1 matrices from X, the optimal 
solution of the corresponding SDP relaxation, such that 



where yiS are orthogonal to each other. The method of Varadarajan et al [20] 
first applies the Johnson-Lindenstrauss randomized dimension reduction tech- 
nique [17] to reduce the rank of solution X to k + 0(log n ■ log d) without losing 



X = XiViVi + X 2 V 2 V 2 H h XdVdvJ. 



k 
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much in the quality of the solution in terms of the objective value. Then they 
show that among the k + 0(log n ■ log d) vectors, which are orthogonal to each 
other, k vectors can be randomly chosen as the solution with an approximate 
ratio 0{\/logn ■ log d). 

Our algorithm is based on the same SDP relaxation developed in [20] . How- 
ever, our rounding procedure is different. Our procedure works as follows. Once 
obtaining an optimal solution for the SDP relaxation with 



X = XiViVi + X2V2V2 H h XdVdvJ, 



we deterministically partition the vectors vi,V 2 , ■ ■ ■ ,Vd into k groups where group 
j may contain rij vectors and each group can be seen as a single semidefinite 
matrix with rank nj. We can then generate one vector from each group using 
the randomized rounding procedure similar to that of Nemirovskii et al. [18]. 
The k vectors generated by this rounding procedure will automatically satisfy 
the condition that any pair of them must be orthogonal to each other, and the 
quality of these vectors have an approximation ratio no more than O(vlogn). 

We now present the quadratic program formulation of the fc-radii problem 
and its semidefinite programming relaxation. It will be helpful to first introduce 
some notations that will be used later. The trace of a given matrix A, denoted 
by Tr(A), is the sum of the entries on the main diagonal of A. We use I to 
denote the identity matrix whose dimension will be clear in the context. The 
inner product of two vector p and q is denoted by (p, q). The 2-norm of a vector 
X, denoted by j|xj|, is defined by y/ (x, x). A positive semidefinite matrix X are 
represented by A ^ 0. For simplicity, we assume that the P is symmetric in 
the sense that if p € P then — p S P. Denote the set {— p]p € P} by — P. Let 
Q = PU —P. It is clear that Rk{P) < Rk{Q) < ‘^Rk{P)- Therefore, if we found 
a good approximation for Rk{Q) then it must also be a good approximation for 
Rk{P). Furthermore, since Q is a symmetric point set, the best {d — /c)-flat for 
Q contains the origin so that it is a subspace. 

Thus, the square of Rk{P) can be defined by the optimal value of the following 
quadratic minimization problem: 

Rk{P)^ '■= Minimize a 

Subject to Ei=i(P) Vp e P, ^ 

||x,f = 1, *=!,..., fc, 

(xj,Xj) = 0, Vz J. 

Assume that xi,X 2 , • • • ,Xfc S R^ is the optimal solution of (1). Then one can 
easily verify that the matrix X = xixf -|-X 2 x|’ -I- • • - + XkxJ is a feasible solution 
for the following semidefinite program: 

aj := Minimize a 

Subject to Tr(pp^A) (= p^Xp) < a, Vp S P, 

Tr(A) = k, 

I -XPO, XPO. 



( 2 ) 
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It follows that < Rk{P)^. The following Lemma, which is proved in Varadara- 
jan et al [20], follows from the above observations. We reproduce the proof below 
for completeness. 

Lemma 1 There exists an integer r > k such that we can compute, in poly- 
nomial time, r nonnegative reals Xi,\ 2 , - ■ ■ ,Xr and r orthogonal unit vectors 
vi,V 2 , ■ ■ ■ ,Vr such that 

(i) - ELi Ai = k. 

(ii) . maxi<i<r Ai < 1. 

(iii) - El=i < Rk{Pf, for any pG P. 

Proof. We solve the semidefinte program (2), and let X* be an optimal solution 
of (2). We claim that the rank of X*, say r, is at least k. This follows from the 
fact that Tr(Af*) = k and I — X* Y 0. In other words, Tr(Af*) = k implies that 
the sum of the eigenvalues of X* is equal to k, and I — X* y 0 implies that the 
all eigenvalues are less than or equal to 1. Therefore, X* has at least k non-zero 
eigenvalues, which implies that the rank of X* is at least k. Let Ai, A 2 , • • • , A,, 
be the r nonnegative eigenvalues and V\,V 2 , - ■ ■ ,Vr be the corresponding eigen- 
vectors. Then we have X)i=i Ai = /c and maxi<i<r Ai < 1. Furthermore, for any 
pG P, 

r r 

^ X^{p,Vi)'^ = Tr{pp'^ X^VivJ') = Tr{pp'^X*) < al < Rk{,Pf ■ 

i=l i=l 

□ 



3 Deterministic First Rounding 

In this section, we prove a lemma concerning how to deterministically group the 
eigenvalues and their eigenvectors. The proof of the lemma is elementary but it 
plays an important role for proving our main result. 

Lemma 2 The index set {1, 2, • • • , r} can he partitioned into k sets Ii, I 2 , ■ ■ ■ , Ik 
such that 

(i) . = {1, 2, • • • , r}, and for any i ^ j, h n Ij = 0. 

(ii) . For any i : 1 <i < k, 

Proof. Recall that ~ ^ 0 < A^ < 1 for all j. Without loss of gener- 

ality, we can assume that Ai > A 2 > • • • > A^. Our partitioning algorithm is the 
same as the Longest-Processing-Time heuristic algorithm for parallel machine 
scheduling problem. The algorithm works as follows: 

STEP 1. For i = 1, 2, • • • , fc, set L = 0 and let Li = 0. Let I = {1, 2, • • • , r}. 
STPE 2. While 7^0 

choose j from I with the smallest index; 
choose set i with the smallest value Lp, 

Let Ii := Ii U {j}, Li := Li Xj and I := I - {j}. 
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It is clear that when the algorithm stops, the sets Ii, I 2 , ■ ■ ■ , Ik satisfy condi- 
tion (i). Now we prove condition (2) by contradiction. Assume that there exists 
some t such that < I- 

We now claim that, for all i, A, < 1. Otherwise, suppose ^ 

1 for some t' . Note that Xj < 1 for every j and thus there are at least two 
eigenvalues are assigned to It' ■ Denote the last eigenvalue by A^' . It follows that 

^ Sis/t since, otherwise, we would have not 
assigned A^' to It' in the algorithm. However, since must have 

^7 - A.' = < h- Thus, A,/ > i This is 

impossible since A^' is the last eigenvalue assigned to It' , which implies As' < Xj 
for every j G It', and we have already proved that there must exist an I such 
that s' I G If and A; < X]jg 7 j,\{s'} ^ 5- Therefore, Xj < 1 for all 

i, and in particular ^3 ^ I- follows that ^3 < However, 

we know that, by condition (i), ^jeii ^3 ~ ^3 ~ This results a 

contradiction. Therefore, such t does not exists and we have proved condition 
(ii). □ 



Notice that the running time of the partitioning algorithm is bounded by 
0{r ■ k). An alternative way of partitioning the eigenvalues is the following: 
First, put the eigenvalues that are greater than or equal to 1/2 into distinct 
subsets. If the number of such eigenvalues, say /, is not less than k, then we 
are done. Otherwise, arbitrarily put the remaining eigenvalues into k — I subsets 
such that the sum of eigenvalues in each subset is greater than or equal to 1/2. 
This method is suggested by an anonymous referee. 



4 Randomized Second Rounding 



Assume now that we have found Ii,l 2 ,- ■ ■ ,Ik- Then our next randomized round- 
ing procedure works as follows. 

STEP 1. Generate a r dimensional random vector /> such that each entry of 
(j) takes value, independently, —1 or 1 with probability ^ each way. 

STEP 2. For f = 1,2,-- - ,fc, let 



'^ 3 eh ■ '*^7 




The following Lemmas show that X\,X 2 , - ■ ■ ,Xk form a feasible solution for 
the original problem. In other words, they are k orthogonal unit vectors. 



Lemma 3 For i = 1, 2, • • • , fc, ||a:i|| = 1. 
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Proof. Recall that {vi,Vj) = 0 for any I yf j and ||r;j|| = 1. By definition, 



1^2 II — {^i : ^i) 









Sjeh 


A, 


1 




'^3 eh 


A, 


1 




^3 eh 


A, 


1 




'^3 eh 


A, 


1 




^3 eh 





\j&ii j&ii 

'Y. {4>j ! 4>j ) 

teh 

Y W^ivY^jW^ 

jeii 

teh 

Y 



= 1 



□ 



Lemma 4 If s ^ t then {xs,xt) = 0. 

Proof. The proof is similar as that of Lemma 3. 

(Xs.Xt) 



S t e L 3 'l2jeit^3 



SjG/s ^3 
1 



SjG/t ^3 



[— — ( Yh VY'^3 ! VY' 

^/eYJYeYJ3 \jy. fYu 



= 0. 



The last equality holds since for any j S Ig and I £ It, {vj ,vi) = 0. □ 

Now we establish a bound on the performance of our algorithm. First, let us 
introduce Bernstein’s Theorem (see, e.g., [18]), which is a form of the Chernoff 
Bound. 
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Lemma 5 Let (j) be a random vector whose entries are independent and either 
1 or — 1 with probability .5 each way. Then, for any vector e and (3 > Q, 

Woh{{(t),ef > /3||ef } < 2 • exp(-^). 

Let Cip = Then we have 

Lemma 6 For each i = 1,2, - ■ ■ ,k and each p G P, we have 

2 

prob{(p,Xi)^ > 121og(n) • Cip} < 

Proof. Given i and p, define &\Ii\ dimensional vector e such that its entries are 
y/Xj{p,Vj), j G li, respectively. Furthermore, we define the vector (p\j. whose 
entries are those of 4> with indices in L. First notice that 

l|ef = = (p,vj)^ = C,p. 



On the other hand, 






= p 



^jeii 



<2/p,J2\/%' 



j&h 

\ 2 






3^h 



(since >^j > ^) 

16 A 



= 2 XI 

\16A 

= 2 {4'\ii,ef 



Thus 



prob{(p,a;i)^ > 12 log(n)Cip} < prob{((()|/,, e)^ > 61og(n)||e|p}. 

Therefore, the conclusion of the lemma follows by using Lemma 5 and by letting 
13 = 61og(n). □ 

Theorem 1 We can computed in polynomial time, a {d—k)-fiat such that, with 
probability at least 1 — —, the distance between any point p G P and F is at most 

y'Ulogin) ■ Rk{P). 



186 Yinyu Ye and Jiawei Zhang 



Proof. For given i = 1,2, - ■ ■ ,k and p € P, consider the event 
Bip = {(l)\{p,Xi)‘^ > 121og(n) • Cip} 

and B = Ui^pBip. The probability that the event B happens is bounded by 

E 9hr) 9 
prob{(p,a:i)^ > 121og(n) • Cip} < — 

71'^ n 

i,p 

If B does not happen, then for any i and p, 

(p,Xi)'^ < 121og(n) • Cip. 



Therefore, for each p G P, 

k k 

< 121og(n) y] Cip < 121og(n) • 

i=l 

The last inequality follows from Lemma 1. This completes the proof by taking 
F as the flat which is orthogonal to the vectors xi,X 2 , - ■ ■ , Xk. □ 

5 Final Remark 

Finding efficient rounding methods for semidefinite programming relaxation 
plays a key role in constructing better approximation algorithms for various 
hard optimization problems. All of them developed to date are randomized in 
nature. Therefore, the mixed deterministic and randomized rounding procedure 
developed in this paper may have its own independent value. We expect to see 
more applications of the procedure in approximating various computational ge- 
ometry and space embedding problems. 
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Abstract. We describe an efficient randomized algorithm to test if a given binary 
function / : {0, 1}'* — > {0, 1} is a low-degree polynomial (that is, a sum of 
low-degree monomials). For a given integer fc > 1 and a given real e > 0, the 
algorithm queries f at 0{^ + kA^) points. If / is a polynomial of degree at most 
fc, the algorithm always accepts, and if the value of / has to be modified on at 
least an e fraction of all inputs in order to transform it to such a polynomial, 
then the algorithm rejects with probability at least 2/3. Our result is essentially 
tight: Any algorithm for testing degree-fc polynomials over GF{2) must perform 
+ 2^) queries. 



1 Introduction 

In this work we consider the problem of testing whether a binary function / : {0,1}" ^ 
(0, 1} is a polynomial of degree at most k satisfying /(O, . . . , 0) = 0, for a given in- 
teger parameter fc. Such a polynomial is simply a sum (modulo 2) of monomials each 
being a product of at most fc variables, with the free term equal to zero. (The restriction 
/(O, . . . , 0) = 0 is imposed mainly for historical reasons, to make our definition and 
result consistent with the previously treated case of linear functions fc = 1. With minor 
changes our algorithm can be adapted to test the class of all polynomials of degree at 
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most k in n variables, without the restriction on the free term.) The algorithm is re- 
quired to accept functions that are polynomials of degree at most k (vanishing at zero), 
and to reject, with probability at least 2/3, functions that av&far from any such poly- 
nomial. More precisely, the algorithm is given a distance parameter e, and is required 
to reject (with probability at least 2/3) any function whose value should he modified 
on more that an e-fraction of the domain to become a degree-fc polynomial / satisfy- 
ing /(O, . . . , 0) = 0. To this end the algorithm can query the function / on inputs of 
its choice, where our goal is to minimize the query complexity of the algorithm (as a 
function of k, 1 /e, and n). 

The problem of testing multivariate low-degree polynomials has been studied quite ex- 
tensively [4,3, 13, 11, 17, 12, 2], and has important applications in the context of Proba- 
bilistically Checkable Proofs (PCP). However, with the exception of the case k = 1, that 
is, linear functions (which we discuss helow), all results apply only to testing polynomi- 
als over fields that are larger than k (the degree hound). When the held F is sufficiently 
large, it is possible to reduce the problem of testing whether a function / : F" — > F 
is a multivariate degree-fc polynomial to testing whether a function is a degree-fc uni- 
variate polynomial, where the latter task is simply based on interpolation. Namely, the 
test for / selects random lines in F" (more precisely, in the hnite projective geometry 
PG(n — 1, |F|)), and verihes that the restriction of / to each of these lines is a (univari- 
ate) polynomial of degree at most k. This reduction does not hold for small helds, and 
in particular for GF(2), which is our focus. 

As noted above, in the case of fc = 1 (linear functions), the linearity test of Blum, Luby 
and Rubinfeld [10] works also when the underlying held is GF(2). In fact, our test can 
be viewed as an extension of the [10] algorithm, as we explain in more detail below. 
Linearity testing has also been studied in the following papers [4, 11, 6, 7, 5]. 



Our Results 

We describe and analyze an algorithm that tests whether a function / : {0,1}" ^ 
{0, 1} is a degree-fc polynomial satisfying /(O, . . . , 0) = 0, or is e-far from any such 
polynomial, using 0(l/e + k ■ 2^^') queries. As we show, the exponential dependency 
on k is unavoidable. This is in contrast to the case of testing degree-fc polynomials over 
larger helds, where the sample complexity is polynomial in k. Our testing algorithm is 
simple. It repeats the following check + k2^) times: It selects, uniformly and at 

random, fc + 1 vectors yi, . . . , yk+i S {0, 1}". It then evaluates / on the sum of every 
non-empty subset of the selected vectors, and checks that the sum of these evaluations 
is 0. If all checks succeed then it accepts, otherwise it rejects. Note that for the special 
case of fe = 1, we obtain the linearity test of [10] which uniformly selects 0(1 /e) pairs 
2 / 1 , 2/2 G {0,1}”, and verihes for each pair that /(i/i) + /(j/ 2 ) = /( 2 /i + 2 / 2 )- 
Our choice of the sets corresponds to a random selection of a (fc + 1) -dimensional 
subspace in the affine geometry AG(n,2) (see for example [14, Chap. 12]). In case 
fe = 1 we deal with lines of the affine geometry PG(n, 2). 

As a by-product of our analysis we obtain a self-corrector (as dehned in [10]) for /, 
in case / is sufficiently close to a degree-fc polynomial g. Specihcally, for any given 



190 Noga Alon et al. 



X G {0, 1}", it is possible to obtain the value g{x) with high probability by querying / 
on additional, randomly selected, points. 

Relation to Coding 

Our setting and results have a very natural interpretation in terms of coding theory. 
The set of (evaluations of) all polynomials in n variables of degree at most k over 
GF{2) is called the Reed-Muller code TZ{k,n) with parameters k and n. (See, e.g., 
[16] for relevant background). So our algorithm can be considered as (locally) testing 
Reed-Muller codes. To be more accurate, as we consider only polynomials / vanishing 
at zero, we in fact test the so-called shortened Reed-Muller code TZ{k,n)*, obtained 
from TZ{k,n) by choosing all codewords with the first bit (i.e. that corresponding to 
the zero vector) equal to zero, and deleting this bit. The Reed-Muller code TZ{k, n) is 
a linear code in {0, 1}^ of minimum distance 2"“^. The dual code of TZ{k, n) is the 
Reed-Muller code 7i{n — k — l,n). The dual code of the shortened Reed-Muller code 
TZ{k,n)* is the so called punctured Reed-Muller code with parameters n—k — 1 and n, 
obtained from TZ{n — k — l, n) by deleting the first bit of every codeword. The minimum 
distance of the punctured Reed-Muller code with parameters n—k — 1 and n is 2^'+^ — 1, 
and its minimum weight codewords are obtained from the minimum weight codewords 
of TZ{n — k — l,n), having the first bit equal to 1, by deleting this bit; the number of 
minimum weight vectors is proportional to 

For an arbitrary vector from {0, 1}^ we want to distinguish between two cases: the 
vector belongs to the code, or, alternatively, it is at (Hamming) distance at least e • 2" 
from the closest codeword of TZ{k, n)* . Our strategy is then to pick a random minimum 
weight vector from the punctured TZ{n- k—l,n), and to check if it is orthogonal to the 
tested vector. Clearly, this will always confirm orthogonality if the considered vector is 
from the code. However, we prove that if the tested vector is far enough from the code, 
with positive probability the test will detect it, and give an estimate for this probability. 

2 Preliminaries 

For any integer £, we denote by \P\ the set {1, . . . , For any k G [n], let Vk denote the 
family of all Boolean functions over {0,1}" which are polynomials of degree at most k 
without a free term. That is, f G Vk if and only if there exist coefficients as G {0,1}, 
for every SC [n], 1 < [S'! < k, such that 



where the addition is in GF{2). In particular, Vi is the family of all linear functions 
over {0, 1}", that is, all functions of the form where SC [n]. 

For any two functions f,g : {0,1}" ^ {0i 1}> the symmetric difference between / 
and g is A{f,g) {y G {0, 1}” : f{y) ^ g{y)}. The relative distance dist{f,g) G 
[0, 1] between / and g is: dist(/, g) |Z\(/, g)|/2". For a function g and a family of 




( 1 ) 



SC[n],|S|G[fe] i&S 
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functions F, we say that g is e-far from F, for some 0 < e < 1, if, for every f G F, 
dist(g, /) > e. Otherwise it is e-close to F. 



A testing algorithm (tester) for Vk is a probabilistic algorithm, that is given query access 
to a function /, and a distance parameter e, 0 < e < 1. If / belongs to Vk then with 
probability at least the tester should accept /, and if / is e-far from Vk, then with 
probability at least | the tester should reject it. If the tester accepts every / in Vk with 
probability 1, then it is a one-sided tester. 



The following notation will be used extensively in this paper. Given a function / : 
{0, ir ^ {0, 1}, for yi, ..., y, € {0, 1}- let 



0#sc[q Vies / 



( 2 ) 



where the first sum is over GF{2) and the second one is over (GF(2))", and let 



= Tf{yi,...,yi) + fiyi) . (3) 



3 Characterization of Low Degree Polynomials over {0,1}’^ 

Claim 1 A function / belongs to Vk (i-e., it is a polynomial of total degree at most k 
satisfying /(O, 0, . . . , 0) = 0), if and only if lot every yi, . . . , yk+i G {0, 1}" we have 

Tf{yi,...,yk+i) = Q ■ (4) 

Proof. A polynomial from Vk can be viewed as a code word in the appropriate Reed- 
Muller code, see, e.g. , [ 1 6] . Thus, the above characterization can be proved using known 
facts about its dual. For completeness we provide a direct, simple proof. 

We first prove that if a function / belongs to Vk then Tf{yi, . . . , yk+i) = 0 for every 
yi,...,yk+i G {0,1}”. 

As / is a sum of monomials of total degree at most k it suffices to show that for ev- 
ery monomial m = Xi, where 1 < |/| < k, Tm{yi, ■ ■ ■ ,yk+i) = 0 for ev- 
ery 2 / 1 , ... , 2/fc+i G jo, 1}”. The number of linear combinations where 

bj G {0,1}, for which ^jyj) = 1 clearly the number of solutions of a linear 

system of |/| equations in the k + 1 variables bj, and the trivial combination bj = 0 for 
all j is not one of the solutions. Therefore, this number of solutions (which is possibly 
zero) is divisible by 2^+^” 1^1 , showing that there is an even number of sets S satisfying 
0 7 ^ 5 C [fc + 1] such that rn(J2ieS Vi) = 1- This implies that Tm{yi, ■■■, Vk+i) = 0, 
as needed. 

We next show that if / = f{xi,X 2 , • ■ . , Xn) ■ {0, 1}” i— *■ {0, 1} satisfies Equation (4) 
for every ?/i, 2 / 2 , ■ • ■ , 2/fc+i G {0, 1}", then / G Vk- Every function from {0, 1}" to 
{0,1} can be written uniquely as a polynomial over GF{2): 

/ = X] 

/C[n] iel 
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Our objective is to show that O0 = 0 and that aj = 0 for all |/| > k. Taking yj = 
(0,0,..., 0) for every j we conclude, by (4), that ag, = 0. Suppose, now, that there is a 
nonzero aj with |/| > k. Take such an / of minimum cardinality, and assume, without 
loss of generality, that I = [s] with s > fc + 1. 

Let Ci denote the z-th unit vector in {0, 1}", and define i/i = ei, z/2 = 62, • ■ • , Uk = &k 
and yk+i = ^k+i + . . . + 6s. Then the monomial m = ai Hig/ 
ish on Vi vanish on every 0 ^ S ^ [fc + 1]. Thus 

Tm{yi, ■ ■ ■ ,yk+i) 7^ 0. On the other hand, for any other monomial, say, m' = Oig/' 
with a nonzero coefficient in the representation of /, Tm'{yi, ■ ■ ■ ,yk+i) = 0. In- 
deed, if \r\ < k this holds by the first part of the proof. Otherwise, by the min- 
imality of I, rn'{J2i^syi) ~ S' C [fc -I- 1]. Altogether this implies that 

Tf{yi,y 2 , ■ • ■ , Vk+i) = 1, contradicting the assumption. 

This completes the proof of Claim 1 . 

4 A One-Sided Tester for Low Degree Polynomials over {0, l}’^ 

In this section we present and analyze a one-sided tester for Vk- This tester generalizes 
the linearity tester of Blum, Luby and Rubinfeld [10]. 

Algorithm Test-T’fc 

1. Uniformly and independently select -I- fc2^') groups of vectors. Each group 

contains fc -f 1 uniformly selected random vectors yi, . . . , yk+i G {0, 1}". 

2. If for some group of vectors j/i, . . . , yk+i it holds that Tf{yi, . . . , yk+i) ^ 0, then 
reject, otherwise, accept. 

Theorem 1 The algorithm Test-T^^ is a one-sided tester for Vk with query complexity 
0 (i-bfc 22'=). 

From the test definition and from Claim 1 it is obvious that if / G Vk, then the tester 
accepts. Thus, the crux of the proof is to show that if / is e-far from Vk, then the 
tester rejects with probability at least 2/3. Our proof has a similar general structure to 
Sudan’s analysis [18] of the linearity test in [10], but requires some additional ideas. 
In particular, if / is the function tested, we can define a function g as follows. For any 

2/G{0 ,ir: 

g{y) = 1 ifPry2.....y;,+ig{o.i}"[77(2/2, ■ • ■ ,yk+i) = 1] > 1/2 wcid g{y) = 0 otherwise. 

(5) 

Thus g is a kind of majority function. That is, for every vector y G {0, 1}", g{y) is 
chosen to satisfy most of the equations Tj{y 2 , . . . , yk+i) = g{y)- We also define 

V ='" Pryi.....y,+1G{0.1}” [Tf{yi , . . . , yk+i) -h 0] 

= Pl'yi.....y,+iG{0.1}”['Pf (2/2,---,yfc+l) 7^ /(yi)] • (6) 

Note that r\ is simply the probability that a single group of vectors yi, . . . , yk+i selected 
by the algorithm provides evidence that f Vk- We shall prove two claims. The first. 
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and simpler claim (in Lemma 2), is that if ij is small, then g is close to /. The second and 
more involved claim (in Lemma 5) is that if g is small, then g must belong to Vk - This 
would suffice for proving the correctness of a slight variation on our algorithm that uses 
a larger sample size. In order to attain the sample complexity claimed in Theorem 1, 
we shall need to prove one more claim that deals with the case in which g is very small 
(see Lemma 6). 

Lemma 2 For a fixed function f, let g and g be as defined in Equations (5) and (6), 
respectively. Then, dist(/, g) < 2g. 

Proof. Recall that for every y e {0,1}”, ^^£{0,1}" [77(t/2, • ■ • , 2/fc+i) = 

g{y)] > 1/2. Hence 

V = Pry, ['77(2/2, ■ • ■ , 2/fc+i) f{y)] 

= ^ P^V2,-,vk+ie{0,irlTf{y2,---,yk+i) f{y)] 

ye{o,i}" 

Pry2. ....yfc+iG{o,i}" [ 7 / (2/2, •■■,2/fe-K) =£/(//)] 

yeA{f,g) 

Thus, dist(/, g) = ^^^^{,’9)1 < 2g. 



Recall that by the definition of (/ as a majority function, for every y, we have that for at 
least one half of the fc-tuples of vectors 2/2, • ■ • j Vk+i, Tj{y2, ■ ■ ■ , yk+i) = g{y)- In the 
next lemma we show that this equality actually holds for a vast majority of the /c-tuples 
j/2, • ■ • , yk+i (assuming g is sufficiently small). 

Lemma 3 For every y e {0,1}” : Pry2.....y,+i6{o,i}" [5(2/) = 77(2/2, 2/fe-ri)] > 

1 — 2kg. 

In order to prove Lemma 3 we shall first establish the following claim. 

Claim 4 For every y, z,w,y 2 , ... ,yk G {0,1}” 



7/(2/, 2/2, ■ • ■ , 2/fc, w) + Tf{y, 2/2, • ■ • , yk, z) 

= Tf{y + w,y2,...,Vk,y + w + z) + Tf{y + z,y2,...,Vk,y + w + z) (7) 

Proof. Let Y = {?/2, ■ ■ ■ , Vk], and consider any set I C {2, . . . , k}, which may be the 

empty set. For a vector x G {0, 1}” denote fyjix) 2/i + 9^)- 

For every set / C {2, . . . , k}, each element of type yi) appears twice in both 

sides of Equation (7) and thus cancels out. Now for every set / C {2, ... ,k} (including 
the empty set), we get in the left hand side of Equation (7): 



fYj{y) + fY,i{w) + fyjiy + w) + fY,i{y) + Iyi (z) + fyjiy + z) . 
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In the right hand side of Equation ( 7 ) we get: 

fY,i{y + w) + frjiy + z + w) + fY,i{z) + fY,i{y + z) + fY,i{y + w + z) + fyjiw) . 

This implies equality over GF{ 2 ). 

We now turn to prove Lemma 3 . 

Proof of Lemma 3 : We fix i/ e { 0 , 1 }” and let 7 Pry2.....yfc+iG{o,i}"[ff(2/) = 

TJ(i/ 2, • ■ • , 2/fe+i)]- Recall that we are interested in proving that 7 > 1 — 2 krj. To this 
end, we shall bound a slightly different, but related probability. Let 

^ P^V2,-,v^+i,z2,-,Zk+ie{0,ir[T]{y2, ■ ■ • , yk+i) = Tj{z2 , . . . , Zk+i)] ■ (8) 

Then, by the definitions of 7 and 5 , 

5 = Pr[rJ(y2, • ■ • , Vk+i) = g(y) and r/(^2, ■ • ■ , Zk+i) = g{y)] 

+ Pr[rJ(?/2, • ■ ■ , yk+i) + g{y) and T)'(2:2, ■ ■ ■ , Zk+i) ^ g{y)] 

= 72 + (1 _ ^)2 ( 9 ) 

where the probabilitites are over the choice of j/2, ■ • ■ , 2/fc+i, -^2, • ■ • , Zk+i S {0, 1}". 
Since we are working over GF{ 2 ), 

^ = Pry 2 ,-,vk+i,z 2 ,...,Zk+ie{o,ir [Tf{y, Vk+i) + Tf{y,Z 2 ,..., Zk+i) = 0] . 

Now, for any choice of j/2, • ■ • , Vk+i and Z2, ■ ■ ■ , Zk+i- 

Tf{y,y2,---,yk+i) + Tf{y,z2, . ■ . ,Zk+i) = 

Tf{y,y2,---,yk+i) + Tf{y,y2, . . . ,yk,Zk+i) + 

Tf{y,y2,---,yk,Zk+i) + Tf{y,y2, . . . ,yk-i,Zk,Zk+i) + 

Tf{y,y2, ■ ■ . ,yk-i,Zk,Zk+i) + Tf{y,y2,.. . ,yk-2, Zk-i, Zk, Zk+i) + 



Tf{y, 2/2, Z3,..., Zk+i) + Tf{y, 22, , Zk+i). 

Consider any pair Tf{y,y2, ■ . ■ ,ye, zt+i, . . . , Zk+i) + Tf{y,y2,...,y£-i,Zi,..., 
Zk+i) that appears in the above sum. Note that Tf{y, y2,. ■ ■ ,yi, zg+i , . . . , Zk+i) and 
T/(i/, ?/2, . . . , y£-i,Z £, . . . , Zk+i) differ only in a single parameter. Since T/(-) is a 
symmetric function we can apply Claim 4 and obtain that 

Tf{y, 2/2, ■ • ■ , Vi, Z£+1, Zk+i) + Tf{y, t/2, • . • , ye-i,ze, Zk+i) 

= Tf{y + ye,y2, ■ ■ ■ ,ye-i, ze+i, . . . , Zk+i,y + ye + ze) 

+ Tf{y + ze,y2,- ■ ■ ,ye-i,zi+i,. . . ,zk+i,y + ye + ze) (10) 

Recall that y is fixed and 2/2, ... , t/fc+i, Z2, ■ ■ ■ , Zk+i G { 0 , 1 }" are uniformly selected, 
and so all parameters on the right hand side in the above equation are uniformly dis- 
tributed. Also recall that by the definition of p, for T/(ri , . . . , Vk+i), where are uni- 
formly selected at random, Pr,.j_..._r-fc+iG{o.i}" • ■ • ,''’fc+i) 7^ 0 ] = p. Hence, by 

the union bound: 
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^ = P^y2,-,yk+i,z2,-,z^+ie{o,ir 2 / 2 , ■ • ■ , Vk+i) + Tf{y, Zs, ■ • ■ , Zk+i) = 0 ] 

>l-2krj. ( 11 ) 

By combining Equations (9) and (11) we get that 7 ^ + (1 — 7 )^ > 1 — 2kr]. Since 
7 > 1/2 it follows that 7 = 7 ^ + 7(1 — 7 ) > 7 ^ + (1 — 7 )^ > 1 — 2krj. □ 

Lemma 5 If y < (- 4 fc^ 2 ) 2 ^ ’ the function g belongs to Vk- 

Proof By Claim 1 it suffices to prove that if p < ^s(2/ii ■ • ■ > Vk+i) = 0, 

for every yi, . . . , yk+i G {0, 1}". Lef us fix fhe choice of t/i, . . . , yk+i, and recall that 
as defined in Equation (2), Tg{yi, yt+i) = E 0 ^/c[fc+i] 2/*)- Suppose we 

uniformly select k ■ {k + 1) random vectors Zij G {0, 1}", l<i<fc+l, 

Then by Lemma 3, for every I,% I C [/c + 1], with probability at least 1 — 2krj over 
the choice of the Zij ’s, 

Kiel } Kiel iei iei iei ) Kiel ) 

Let El be the event that Equation (12) holds for all 0 / C [fc + 1], By the union 

bound: 

Pr[^2i] > 1 - (2'=+^ - 1) • 2kg (13) 

Assume that E\ holds. Then 



Tg{yi, . . ■ ,2/fe-Hl) 

= X 






0^/C[fc+l] L \i^I IGI iGl 



iGl 



\iGl 



= X X 

0^/C[fc+l] <tl^JC[k] 



/XX + / X 22* + X X 



\ iei jeJ 



iei iei jeJ 



= X X / xx^*. 

0 ^jc[fc] ih^i(Z[k+i\ \iei jeJ 

+ X X / fx22*+xx^*.^ 

0 ytjc[fc] 05 ^/c[fc-i-i] \iei iei jeJ 



= X ^2 X > X 

0^jc[fe] \jeJ jeJ 

+ X [ 2/1 + X ^ 2/fc+i + X 






(14) 



0#JC[fc] 



j&J 



3<eJ 
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Let £'2 be the event that for every 0 ^ J Q[k],Tf ^ 

and Tf (^yi + Y^jaj ■ • ■ ; 2 /fc+i + Lje J = 0. By the definition of 77 : 

Pr[£ 2 ] > 1 - 2 ( 2 '= - 1)77 (15) 

Suppose that 77 < Then, hy Equations (13) and (15), the probability that both 

El and £2 hold, is strictly positive. In other words, there exists a choice of the Zij ’s for 
which all summands in Equation (14) are 0. But this implies that Tg{yi, . . . , yk+i) = 0. 
We conclude that if 77 < 9 belongs to Vk, and this completes the lemma’s 

proof. 



By combining Lemmas 2 and 5 we obtain that if / is 17(1 / (fc2'=))-far from Vk, then 77 = 
17(l/(fe2'=)), and so the algorithm rejects / with sufficiently high constant probability 
(since it selects 17(fc2'=) groups of vectors 7 / 1 , ... , yk+i)- We next deal with the case in 
which 77 is small. By Lemma 2, in this case the distance d = dist(/, g) between / and 
g is small, and we show that the test rejects / with probability that is close to ( 2 ^=+^ — 
l)d. This follows from the fact that in this case, the probability over the selection of 
yi, . . . , yk+i, that among the ( 2 ^=+^ - 1 ) points Y.$^iiz[k+i] Vi- functions / and g 
differ in precisely one point, is close to — l)d. This is formally proved in the 

following lemma. 



Lemma 6 Suppose 0 < 77 < 
/ and g, and let 

def 

P = 



( 4 fc+ 2 ) 2 *= • ^ ~ dist(/, g) denote the distance between 



1 _ (2^+1 - l)d 
l + (2fc+i_l)d 



l)d. 



Then, when 7/1, 1 / 2 , • ■ • , 2 /fe+i cire chosen randomly, the probability that for exactly one 
point V among the — 1) points Vt-’ (0 ^ S' C [fc + 1]), f{v) ^ g{v), is at 

least p. 



By definition of 77 and the above lemma, t] > p (under the premise of the lemma). In 
particular, since (by Lemma 2) d < 2rj < (^ 2 k+i) 2 >’ k > l,g > ^(2'=+'^ — l)d, and, 
for fixed k, as d tends to zero, 77 > (2^=+^ — l)d — 0{d^). 

Proof. For each subset S, 0 ^ S C [fc + 1], let Xs be the indicator random 
variable whose value is 1 if and only if Vi) ^ Vi)- Obviously, 

Pr[Xg = 1] = d for every S. It is not difficult to check that the random variables 
Xs are pairwise independent, since for any two distinct nonempty Si,S 2 , the sums 
J2ieSi attain each pair of distinct values in { 0 , 1 }" with equal prob- 

ability when the vectors yi are chosen randomly and independently. It follows that 
the random variable X = which counts the number of points v of the re- 

quired form in which f{v) ^ g{v) has expectation E[X] = (2^=+^ — l)d and variance 
Var[X] = ( 2 '=+^ — l)d(l — d) < E[X] . Our objective is to lower bound the probability 
that X = 1. We need the well known, simple fact that for a random variable X that 
attains nonnegative, integer values. 



Pr[7f > 0] > 



(EW)^ 

E[7f2] ■ 
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Indeed, if X attains the value i with probability for i > 0, then, by Cauchy-Schwartz, 
(E[X])2 = = E[X^]Pr[X > 0], 



i>0 z>0 

In our case, this implies 



z>0 i>0 



p,ix > 01 > aa£ > 



E|X] 



E[X2] - E[X] + (E[X])2 1 + E[X]' 



Therefore 

E[X] > Pr[X = 1] + 

implying that 



EW 

1 + E[X] 






E[X]-(Em)2 
1+E[X] ■ 

Substituting the value of E[Jf], the desired result follows. 



We are now ready to wrap-up the proof of Theorem 1 . 

Proof of Theorem 1: As we have noted previously, if / is in Vk, then by Claim 1 
the tester accepts (with probability 1). We next show that if / is e-far from Vk, then the 
tester rejects with probability at least 

Suppose that dist{f,Vk) > e. Denote d = dist{f,g). If r] < ^ 4 ^^ 212 ^ 

Lemma 5 g G Vk and, by Lemma 6, ij > f2{2^d) > 12(2^' e). Hence, 77 > 
min (jl(2^e), -(jk^ 2 }^')' Clearly it is enough to perform O(^) rounds of the algo- 
rithm in order to detect a violation with probability at least | . This completes the proof 
of the theorem. □ 



4.1 Self-correcting and a Lower Bound 

From Lemmas 2, 3, and 5 one can immediately conclude the following: 

Corollary 7 Consider a function f : {0, 1}” ^ {0, 1} that is e-close to a degree-k 
polynomial g : {0,1}” ^ |0i 1}> where e < ^ 4 ^k+ 2 ) 2 >= • Then the function f can be 
self-corrected. That is, for any given a: C {0, 1}”, it is possible to obtain the value g{x) 
with probability at least 1 — ek by querying f on 2^ — 1 points in { 0 , 1 }". 

The following is a lower bound on families of functions that correspond to linear codes. 



Theorem 2 Let T be any family of functions f : {0,1}” — > {0,1} that corresponds to 
a linear code C. Let d denote the minimum distance of the code C and let d denote the 
minimum distance of the dual code ofC. 

Every testing algorithm for the family T must perform L2{d) queries, and if the distance 
parameter e is at most then I7(l/e) is also a lower bound for the necessary 

number of queries. 
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As noted in the introduction, the family Vk corresponds to the shortened Reed-Muller 
code TZ{k,n)*. It is well known (see [16, Chap. 13]) that the distance of TZ{k,n)* is 
2 n-k jjjg distance of the dual code (which is a punctured Reed-Muller code) is 
2 '=+! _ 1 . Hence we obtain the following corollary. 

Corollary 8 Every algorithm for testing Vk with distance parameter e must perform 
f] (max(i, 2^+^)) queries. 

Proof of Theorem 2: We start with showing that f2(d) queries are necessary. A well 
known fact from coding theory (see [16, Chap. 5]) states the following: for every lin- 
ear code C whose dual code has distance d, if we examine a sub-word having length 
< d, of a uniformly selected codeword in C, then the resulting sub- word is uni- 
formly distributed in {0, 1}'^ . Hence it is not possible to distinguish between a random 
codeword in C and a random word in 2" (which with high probability is far from any 
codeword) using less than d queries. 

We now turn to the case e < d/2"+^ . To prove the lower bound here, we apply, as usual, 
the Yao principle by defining two distributions, one of positive instances, and the other 
of negative ones, and then by showing that in order to distinguish between those dis- 
tributions any algorithm must perform f?(l/e) queries. The positive distribution has all 
its mass at the zero vector 0 = (0, . . . , 0). To define the negative distribution, partition 
the set of all coordinates into t = 1/e nearly equal parts Ii, . . . ,It and give weight 1 /t 
to each of the characteristic vectors Wi of 7^, * = 1, . . . , f. (Observe that indeed 0 G C 
due to linearity, and dist{wi,C) = e due to the assumption on the minimum distance 
of C). Finally, a random instance is generated by first choosing one of the distributions 
with probability 1 /2, and then generating a vector according to the chosen distribution. 
It is easy to check (see, e.g., [1] for details) that in order to give a correct answer with 
probability at least 2/3, the algorithm has to query I7(l/e) bits of the input. 

□ 

5 Concluding Remarks 

We first note that in view of the above lower bound, our upper bound is almost tight. 

It will be interesting to study analogous questions for other linear binary codes. Several 
recent papers, including [8], [9], deal with related questions. As shown above, a code 
is not testable with a constant number of queries if its dual distance is not a constant, 
and it seems plausible to conjecture that if the dual distance is a constant, and there 
is a doubly transitive permutation group acting on the coordinates that maps the dual 
code to itself, then the code can be testable with a constant number of queries. The 
automorphism group of punctured Reed-Muller codes contains the general linear group 
GL(n, 2), and thus those codes supply an example with these properties. Another in- 
teresting example is duals of BCH codes (this class also contains linear functions as a 
particular case). Another possible extension of the results could be the study of testa- 
bility of low-degree multivariate polynomials over small fields GF{q). This situation 
corresponds to generalized Reed-Muller codes [15]. 
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Abstract. Min-entropy is a statistical measure of the amount of ran- 
domness that a particular distribution contains. In this paper we investi- 
gate the notion of computational min-entropy which is the computational 
analog of statistical min-entropy. We consider three possible dehnitions 
for this notion, and show equivalence and separation results for these 
definitions in various computational models. 

We also study whether or not certain properties of statistical min-entropy 
have a computational analog. In particular, we consider the following 
questions: 

1. Let A be a distribution with high computational min-entropy. Does 
one get a pseudo-random distribution when applying a “randomness 
extractor” on A? 

2. Let A and Y be (possibly dependent) random variables. Is the com- 
putational min-entropy of (A, Y) at least as large as the computa- 
tional min-entropy of A? 

3. Let A be a distribution over {0, 1}" that is “weakly unpredictable” 
in the sense that it is hard to predict a constant fraction of the 
coordinates of A with a constant bias. Does A have computational 
min-entropy 17(n)? 

We show that the answers to these questions depend on the computa- 
tional model considered. In some natural models the answer is false and 
in others the answer is true. Our positive results for the third question 
exhibit models in which the “hybrid argument bottleneck” in “moving 
from a distinguisher to a predictor” can be avoided. 



1 Introduction 

One of the most fundamental notions in theoretical computer science is that of 
computaional indistinuishability [1,2]. Two probability distributions are deemed 
close if no efficient^ test can tell them apart - this comes in stark contrast to 

^ What is meant by “efficient” can naturally vary by specifying machine models and 
resource bounds on them 
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the information theoretic view which allows any test whatsoever. The discovery 
[3,2,4] that simple computational assumptions (namely the existance of one-way 
functions) make the computational and information theoretic notions completely 
different has been one of the most fruitful in CS history, with impact on cryp- 
tography, complexity theory and computational learning theory. 

The most striking result of these studies has been the efficient construction 
of nontrivial pseudorandom distributions, namely ones which are information 
theoretically very far from the uniform distribution, but are nevertheless indis- 
tinguishable from it. Two of the founding papers [2,4] found it natural to extend 
information theory more generally to the computational setting, and attempt 
to define its most fundamental notion of entropy^. The basic question is the 
following: when should we say that a distribution has (or is close to having) 
computational entropy (or pseudoentropy) k7. Interestingly, these two papers 
give two very different definitions! This point may be overlooked, since for the 
most interesting special case, the case of pseudorandomness (i.e., when the dis- 
tributions are over rr-bit strings and k = n), the two definitions coincide. This 
paper is concerned with the other cases, namely k < n, attempting to continue 
the project of building a computational analog of information theory. 



1.1 Definitions of Pseudoentropy 

To start, let us consider the two original definitions. Let X be a probability 
distribution over a set S. 

A definition using “compression” . Yao’s definition of pseudoentropy [2] is based 
on compression. He cites Shannon’s definition [5], defining H{X) to be the min- 
imum number of bits needed to describe a typical element of X . More precisely, 
one imagines the situation of Alice having to send Bob (a large number of) 
samples from X, and is trying to save on communication. Then H{X) is the 
smallest k for which there are a compression algorithm A (for Alice) from S into 
fc-bit strings, and a decompression algorithm B (for Bob) from fc-bit strings into 
S, such that B{A{x)) = x (in the limit, for typical x from X). Yao take this 
definition verbatim, adding the crucial computational constraint that both com- 
pression and decompression algorithms must be efficient. This notion of efficient 
compression is further studied in [6]. 

A definition using indistinguishability . Hastad et al’s definition of pseudoentropy 
[4] extends the definition of pseudorandomness syntactically. As a distribution is 
said to be pseudorandom if it is indistinguishable from a distribution of maximum 
entropy (which is unique), they define a distribution to have pseudoentropy k is 

While we will first mainly talk about Shannon’s entropy, we later switch to min- 
entropy and stay with it throughout the paper. However the whole introduction 
may be read when regarding the term “entropy” with any other of its many formal 
variants, or just as well as the informal notion of “information content” or “uncer- 
tainty” 
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it is indistinguishable from a distribution of Sahnnon entropy k (for which there 
are many possibilities). 

It turns out that the two definitions of pseudoentropy above can be very 
different in natural computational settings, despite the fact that in the infor- 
mation theoretic setting they are identical for any k. Which definition, then, is 
the “natural one” to choose from? This question is actually more complex, as 
another natural point of view lead to yet another definition. 

A definition using a natural metric space. The computational viewpoint of ran- 
domness may be thought of as endowing the space of all probability distributions 
with new, interesting metrics. 

For every event (=test) T in our probability space we define: dT{X,Y) = 

I Pi'x[T] — Piv[T]|. In words, the distance between X and Y is the difference (in 
absolute value) of the probabilities they assign to T.® 

Note that given a family of metrics, their maximum is also a metric. An 
information theoretic metric on distributions, the statistical distance^ (which is 
basically ^Li-distance) is obtained by taking the maximum over the T-metrics 
above for all possible tests T. A natural computational metric, is given by taking 
the maximum over any class C of efficient tests. When should we say that a 
distribution X is indistinguishable from having Shannon entropy k? Distance to 
a set is the distance to the closest point in it, so X has to be close in this metric 
to some Y with Shannon entropy k. 

A different order of quantifiers. At first sight this may look identical to the 
“indistinguishability” definition in [4]. However let us parse them to see the 
difference. The [4] definition say that X has pseudoentropy k if there exists a 
distribution Y of Shannon entropy k, such that for all tests T in C,T has roughly 
the same probability under both X and Y. The metric definition above reverses 
the quantifiers: X has pseudoentropy k if for every a distribution Y of Shannon 
entropy fc, there exists a test T in C, which has roughly the same probability 
under both X and Y . It is easy to see that the metric definition is more liberal 
- it allows for at least as many distributions to have pseudoentropy k. Are they 
really different? 

Relations between the three definitions. As all these definitions are natural and 
well-motivated, it makes sense to study their relationship. In the information 
theoretic world (when ignoring the “efficiency” constraints) all definitions are 
equivalent. It is easy to verify that regardless of the choice of a class C of “ef- 
ficient” tests, they are ordered in permisiveness (allowing more distributions to 
have pseudoentropy k). The “indistinguishability” definition of [4] is the most 
stringent, then the “metric definition”, and then the “compression” definition of 

® This isn’t precisely a metric as there may be different X and Y such that dr{X, Y) = 
0. However it is symmetric and satisfies the triangle inequality. 

® Another basic distance measure is the so called KL-divergence, but for our purposes, 
which concern very close distributions, is not much different than statistical distance 
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[2]. What is more interesting is that we can prove collapses aird separations for 
different computatioiral settings aird assumptions. For example, we show that 
the first two definitions drastically differ for logspace observers, but coincide for 
polynomial time observers (both in the uiriform and nonuiriform settiirgs). The 
proof of the latter statemeirt uses the “miir-max” Theorem of [7] to “switch” 
the order of quantifiers. We can show some weak form of equivalence between all 
three definitions for circuits. We show that the “metric” coincides with the “com- 
pression” definition if NP C BPP. More precisely, we give a non- deterministic 
reduction showing the equivalence of the two definitions. This reduction guaran- 
tees high min-entropy according to the ” metric” definition if the distribution has 
high min-entropy according to the “compression” distribution with respect to an 
NP oracle. A clean way to state this is that all three definitions are equivaleirt 
for PH/poly. We refer to this class as the class of poly-size PH-circuits. Such 
circuits are poly-size circuits which are allowed to compute an arbitrary frmctioir 
iir the polynomial-hierarchy (PH). We remark that similar circuits (for various 
levels of the PH hierarchy) arise in related contexts iir the study of “computa- 
tional randomness”: They come up in conditional “derandomizatioir” results of 
AM [8,9,10] and “extractors for samplable distributions” [11]. 



1.2 Pseudoentropy versus Information Theoretic Entropy 

We now move to airother important part of our project. As these definitions are 
supposed to help establish a computational version of iirformatioir theory, we 
attempt to see which of them respect some natural properties of iirformatioir- 
theoretic entropy. 



Using randomness extractors. Iir the information theoretic setting, there are ran- 
domness extractors which convert a high entropy^ distribution into one which is 
statistically close to uniform. The theory of extracting the randomness from such 
distributions is by now quite developed (see surveys [12,13,14]). It is natural to 
expect that applying these randomness extractors on high pseudoentropy dis- 
tributions produces a pseudorandom distribution. In fact, this is the motivation 
for pseudoentropy in some previous works [15,4,16]. 

It is easy to see that the the “indistinguishability” definition of [4] has this 
property. This also holds for the “metric” definition by the equivalence above. 
Interestingly, we do not know whether this holds for the “compression” definition. 
Nevertheless, we show that some extractor constructions in the literature (the 
ones based on Trevisan’s technique [17,18,19,20,10]) do produce a pseudorandom 
distribution when working with the “compression” definition. 



^ It turns out that a different variant of entropy called “min-entropy” is the 
correct measure for this application. The min-entropy of a distribution X is 
log 2 (miUa; 1/ Pr[A = x]). This should be compared with Shannon’s entropy in which 
the minimum is replaced by averaging. 
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The information in two dependent distributions. One basic principle in informa- 
tion theory is that two (possibly dependent) random variables have at least as 
much entropy as any one individually, e.g. H{X,Y) > H{X). A natural ques- 
tion is whether this holds when we replace information-theoretic entropy with 
pseudoentropy. We show that the answer depends on the model of computation. 
If there exist one-way functions, then the answer is no for the standard model of 
polynomial-time distinguishers. On the other hand, if NP C BPP, then the an- 
swer is yes. Very roughly speaking, the negative part follows from the existence 
of pseudorandom generators, while the positive part follows from giving a non- 
deterministic reduction which relies on nondeterminism to perform approximate 
counting. Once again, this result can be also stated as saying that the answer is 
positive for poly-size PH-circiiits. We remark that the positive result holds for 
(nonuniform) online space-bounded computation as well. 

Entropy and unpredictability. A deeper and interesting connection is the one 
between entropy and unpredictability. In the information theoretic world, a dis- 
tribution which is unpredictable has high entropy.® Does this relation between 
entropy and unpredictability holds in the computational world? 

Let us restrict ourselves here for a while to the metric definition of pseu- 
doentropy. Two main results we prove is that this connection indeed holds in 
two natural computational notions of efficient observers. One is for logspace 
observers. The second is for PH-circuits. Both results use one mechanism - a 
different characterization of the metric definition, in which distinuguishers ac- 
cept very few inputs (less than 2^ when the pseudoentropy is k). We show that 
predictors for the accepted set are also good for any distribution “caught” by 
such a distinguisher. This direction is promising as it suggests a way to “bypass” 
the weakness of the “hybrid argument” . 

The weakness of the hybrid argument. Almost all pseudorandom generators 
(whether conditional such as the ones for small circuits or unconditional such 
as the ones for logspace) use the hybrid argument in their proof of correctness. 
The idea is that if the output distribution can be efficiently distinguished from 
random, some bit can be efficiently predicted with nontrivial advantage. Thus, 
pseudorandomness is established by showing unpredictability. 

However, in standard form, if the distinughishability advantage is e, the pre- 
diction advantage is only e/n. In the results above, we manage (for these two 
computational models) to avoid this loss and make the prediction advantage 
12(e) (just as information theory suggests). 

While we have no concrete applications, this seem to have potential to im- 
prove various constructions of pseudorandom generators. To see this, it suffices 
to observe the consequences of the hybrid argument loss. It requires every output 
bit of the generator to be very unpredictable, for which a direct cost is paid in the 

® We consider two different forms of prediction tests: The first called “next bit predic- 
tor” attempts to predict a bit from previous bits, whereas the second called “com- 
plement predictor” has access to all the other bits, both previous and latter. 
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seed length (and complexity) of the generator. For generators against circuits, a 
long sequence of works [2,21,22,16] resolved it optimally using efficient hardness 
amplification. These results allow constructing distributions which are unpre- 
dictable even with advantage l/poly(n). The above suggests that sometimes 
this amplification may not be needed. One may hope to construct a pseudo- 
random distribution by constructing an unpredictable distribution which is only 
unpredictable with constant advantage, and then use a randomness extractor to 
obtain a pseudorandom distribution.® 

This problem is even more significant when constructing generators against 
logspace machines [24,25]. The high unpredictability required seems to be the 
bottleneck for reducing the seed length in Nisan’s generator [24] and its refine- 
ments from 0((logn)®) bits to the optimal O(logn) bits (that will result in 
BPL = L). The argument above gives some hope that for fooling logspace ma- 
chines (or even just constant-width oblivious branching programs) the suggested 
approach may yield substantial improvements. However, in this setup there is 
another hurdle: In [26] it was shown that randomness extraction cannot be done 
by one pass log-space machines. Thus, in this setup it is not clear how to move 
from pseudoentropy to pseudorandomness. 

1.3 Organization of the Paper 

In Section 2 we give some basic notation. Section 3 formally defines our three ba- 
sic notions of pseudoentropy, and proves a useful characterization of the metric 
definition. In Sections 5 and 6 we prove equivalence and separations results be- 
tween the various definitions in several natural computational models. Section 7 
is devoted to our results about computational analogs of information theory 
for concatenation and unpredictability of random variables. Because of space 
limitations many of the proofs do not appear in this version. 

2 Preliminaries 

Let X be a random variable over some set S. We say that X has (statistical) min- 
entropy at least k, denoted H°°{X) > k, if for every x G S, Pr[X = a:] < 2“^. 
We use Un to denote the uniform distribution on {0, 1}". 

Let X, Y be two random variables over a set S. Let f : S ^ {0, 1} be some 
function. The bias of X and Y with respect to /, denoted bias/(X, F), is defined 
by |E[/(X)] — E[/(F)]|. Since it is sometimes convenient to omit the absolute 
value, we denote bias[^(X, F) = E[/(X)] — E[/(F)]. 

The statistical distance of X and F, denoted dist(X, F), is defined to be 
the maximum of bias/(X, F) over all functions /. Let C be a class of functions 
from S to {0,1} (e.g., the class of functions computed by circuits of size m 

® This approach was used in [16] . They show that even “weak” hardness amplification 
suffices to construct a high pseudoentropy distribution using the pseudo-random 
generator construction of [23]. However, their technique relies on the properties of 
the specific generator and cannot be applied in general. 
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for some integer m). The computational distance of X and Y w.r.t. C, denoted 
comp-dist^(AT, y), is defined to be the maximum of bias/(Jf, T) over all f G C. 
We will sometimes drop the subscript C when it can be inferred from the context. 

Computational models. In addition to the standard model of uniform and non- 
uniform polynomial-time algorithms, we consider two additional computational 
models. The first is the model of PH- circuits. A PH-circuit is a boolean circuit 
that allows queries to a language in the polynomial hierarchy as a basic gate.^'^ 
The second model is the model of bounded-width read-once oblivious branching 
programs. A width-S* read once oblivious branching program P is a directed 
graph with Sn vertices, where the graph is divided into n layers, with S vertices 
in each layer. The edges of the graph are only between from one layer to the next 
one, and each edge is labelled by a bit 6 e {0,1} which is thought of as a variable. 
Each vertex has two outgoing edges, one labelled 0 and the other labelled 1. One 
of the vertices in the first layer is called the source vertex, and some of the 
vertices in the last layer are called the accepting vertices. A computation of 
the program P on input x € {0,1}" consists of walking the graph for n steps, 
starting from the source vertex, and in step i taking the edge labelled by Xi. The 
output of P{x) is 1 iff the end vertex is accepting. Note that variables are read 
in the natural order and thus width-S' read once oblivious branching programs 
are the non-uniform analog of one-pass (or online) space-log S algorithms. 

3 Defining Computational Min-entropy 

In this section we give three definitions for the notion of computational (or 
“pseudo”) min-entropy. In all these definitions, we fix C to be a class of functions 
which we consider to be efficiently computable. Our standard choice for this class 
will be the class of functions computed by a boolean circuit of size p{n), where n 
is the circuit’s input length and p(-) is some fixed polynomial. However, we will 
also be interested in instantiations of these definitions with respect to different 
classes C. We will also sometimes treat C as a class of sets rather then functions, 
where we say that a set D is in C iff its characteristic function is in C. We will 
assume that the class C is closed under complement. 



3.1 HILL- type Pseudoentropy: Using Indistinguishability 

We start with the standard definition of computational (or “pseudo”) min- 
entropy, as given by [4]. We call this definition HILL-type pseudoentropy. 

Definition 1. Let X be a random variable over a set S. Let e > 0. We say 
that X has e-HILL-type pseudoentropy at least k, denoted > k, 

if there exists a random variable Y with (statistical) min-entropy at least k such 
that the computational distance (w.r.t. C) of X and Y is at most e. 



Equivalently, the class languages accepted by poly-size PH-circuits is PH/poly. 
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We will usually be interested in e-pseudoentroy for e that is a small constant. 
In these cases we will sometimes drop e and simply say that X has (HILL-type) 
pseudoentropy at least k (denoted > k). 



3.2 Metric- Type Pseudoentropy: Using a Metric Space 

In Definition I the distribution X has high pseudoentropy if there exists a high 
min-entropy Y such that X and Y are indistinguishable. As explained in the 
introduction, it is also natural to reverse the order of quantifiers: Here we allow 
Y to be a function of the “distinguishing test” /. 

Definition 2. Let X be a random variable over a set S. Let e > 0. We say that 
X has e-metric-type pseudoentropy at least k, denoted > k, if 

for every test f on S there exists a Y which has (statistical) min-entropy at least 
k and bias/(A, F) < e. 

It turns out that metric-pseudoentropy is equivalent to a different formula- 
tion. (Note that the condition below is only meaningful for D such that \D\ < 2^). 
The proof of Lemma 1 appears in the full version. 

Lemma 1. For every class C which is closed under complement and for every 
k < log [S'! — 1 and e, > k if and only if for every set D £ C, 

Pr[X e D] < -b e 



3.3 Yao-Type Pseudoentropy: Using Compression 

Let C be a class of functions which we consider to efficiently computable. Recall 
that we said that a set I? is a member of C if its characteristic function was in 
C. That is, a set U is in C if it is efficiently decidable. We now define a family 
Ccompress of sets that are efficiently compressible. That is, we say that a set 
U C ^ is in Ccompress(-^) if there exist functions c, d S C (c : S — > {0,1}^ stands for 
compress and d : {0,1}'^ ^ S for decompress) such that D = {x|d(c(x)) = x}. 
Note that every efficiently compressible set is also efficiently decidable (assuming 
the class C is closed under composition). Yao-type pseudoentropy is defined by 
replacing the quantification over D £ C in the alternative characterization of 
metric-type pseudoentropy (Lemma 1) by a quantification over D £ Ccompress (^) 
for all i < k. The resulting definition is the following: 

Definition 3. Let X be a random variable over a set S. X has e-Yao-type 
pseudoentropy at least k, denoted HJ‘'°(X) > k, if for every £ < k and every set 

D£C..^,Ji), Fr[X£D]<2^-^\y- 
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4 Using Randomness Extractors 

An extractor uses a short seed of truly random bits to extract many bits which 
are (close to) uniform. 

Definition 4 ([27]). A function E : {0,1}" x {0,1}'^ ^ {0,1}™ is a (k,e)- 
extractor if for every distribution X on {0, 1}" with > k, the distribution 

Z = E{X, Ud) has dist(Z, Um) < £• 

We remark that there are explicit (polynomial time computable) extractors 
with seed length polylog(n/e) and m = k. The reader is referred to survey papers 
on extractors [12,13,14]. The following standard lemma says that if a distribution 
X has HILL- type pseudoentropy at least k with respect to circuits, then for every 
randomness extractor the distribution E{X, Ud) is pseudorandom. 

Lemma 2. LetC be the class of polynomial size circuits. Let X be a distribution 
with iJ“'^''(A) > k and let E be a {k, e)- extractor computable in time poly(n) 
then comp-di\stQ{E{X,Ud),Um) < 2e. 

Note that by Theorem 1 the same holds for the metric definition. Interest- 
ingly, we do not know whether this holds for Yao-type pseudoentropy. We can 
however show that this holds for the extractor of Trevisan [17]. Trevisan’s ex- 
tractor : {0, 1}" X {0, i}0(iog""/iogfe) ^ {0, 1}^ is a (A, l/n)-extractor 

Lemma 3. LetC be the class of polynomial size circuits. Let X be a distribution 
with Hf^°{X) > k, then com p-dist^ (£’’'" (A, t/^), Um) < 2e. 

The proof of Lemma 3 appears in the full version. Loosely speaking, the cor- 
rectness proof of Trevisan’s extractor (and some later constructions, c.f., [14]) 
shows that if the output of the extractor isn’t close to uniform, then the distribu- 
tion X can be compressed (which is impossible for a distribution of sufficiently 
high min-entropy). For the lemma, one only needs to observe that in this argu- 
ment an efficient distinguisher gives rise to an efficient compressing algorithm. 
Thus, running the extractor on an “incompressible” distribution gives a pseudo- 
random distribution. 



5 Relationships between Definitions 

5.1 Equivalence between HILL-type and Metric- Type 

The difference between HILL-type and metric-type pseudoentropy is in the order 
of quantifiers. HILL-type requires that there exist a unique “reference distribu- 
tion” Y with H°°{Y) > k such that for every D, bias£)(A, Y) < e, whereas 
metric-type allows Y to depend on D, and only requires that for every D there 
exists such a Y . It immediately follows that for every class C and every X, 
> iJ“"^"'(A). In this section we show that the other direction also 
applies (with small losses in e and time/size) for small circuits. 
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Theorem 1 (Equivalence of HILL- type and metric- type for circuits). 

Let X he a distribution over {0, 1}". For every e, <5 > 0 and k, if (X) > k 

(with respect to circuits of size 0{ns/S^) then > k (with respect to 

circuits of size s ) 

The proof of Theorem 1 appears only in the full version. We now provide a 
sketch of the argument. It is sufficient to show that if < k then then 

H^g"°{X) < k. Suppose indeed that < k. This implies that for every 

Y with > k there is a small circuit D G C such that bias£i(X, Y) > e. 

We consider a game between two players. The “circuit player” Alice chooses 
a small circuit D and the “distribution player” Bob chooses a “flat” distribution 

Y with H°°{Y) > kf^ (Note that both players have a finite number of strategies 
in the game.) After the choices are made, Bob pays distu(A, Y) dollars to Alice. 
Our assumption says that if Alice plays after Bob then she can always win e 
dollars. Loosely speaking, the “min-max” theorem of [7] allows to switch the 
order of quantifiers and assert that Alice can guarantee the same amount even 
when playing first. More formally, we conclude that there exists a distribution 
D over circuits for Alice such that she expects to get e dollars for every reply 

Y of Bob. Note that we were able to switch the order of quantifiers to that of 
the “metric” definition. We are left with the task of converting D into a single 
circuit. This is done by sampling sufficiently many circuits I?i, ■ ■ ■ ,Dt from D 
and taking their average. By a union bound there exists a choice of 77i, ■ ■ ■ , Dt 
which is good for every distribution Y 

In the full version we also prove equivalence for uniform polynomial time 
machines. 

5.2 Equivalence between All Types for PH-circuits 

We do not know whether the assumption that HJ"-°{X) > k for circuits implies 
that i7“““'‘°(A') > k' for slightly smaller k' and circuit size (and in fact, we 
conjecture that it’s false). However, we can prove it assuming the circuits for the 
Yao-type definition have access to an NP-oracle. 

A “flat” distribution is a distribution which is uniformly distributed over a subset 
of S. 

There is a subtlety here. In order to apply the theorem, Alice must be able to win 
£ dollars even when Bob plays a mixed strategy (i.e., a convex combination of his 
choices). However, a convex combination of flat distribntions with min-entropy k 
also has min-entropy k. 

It is crucial that this union bound is not performed over the (^t) choices for Y but 
rather on the 2”' inputs. More precisely, we show that there exist Di, ■ ■ ■ , Dt such 
that for all inputs x, j '^Di{x) « E[Z?(a:)]. 

We find this surprising because the argument above seems to exploit the non- 
uniformity of circuits: The “min-max theorem” works only for finite games and 
is non-constructive - it only shows existence of a distribntion D and gives no cine 
to its complexity. The key idea is the observation that pseudoentropy with respect 
to uniform Tnring machines implies also pseudoentropy for “slightly non-nniform” 
Turing machines. Exact details appear in the full version. 
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Theorem 2. Let fc' = fc + 1 There is a constant c so that if > k' (with 

respect to circuits of size max{s,n‘^) that use an NP-oracle) then > k 

(with respect to circuits of size s). 

The proof of Theorem 2 appears in the full version. The reduction in the 
proof of Theorem 2 uses an NP-oracle. The class of polynomial size PH-circuits 
are closed under the use of NP-oracles /poly = VH./poly). Applying the 

argument of Theorem 2 give the following corollary. 

Corollary 1. Let C he the class of polynomial size PH-circuits. Lf H7‘^°(X) > 
2k then > k. 

6 Separation between Types 

Given the results of the previous section it is natural to ask if HILL-type and 
metric-type pseudoentropy are equivalent in all natural computational models? 
We give a negative answer and prove that there’s large gap between HILL- 
type and metric-type pseudoentropy in the model of bounded-width read-once 
oblivious branching programs. 

Theorem 3. For every constant e > 0 and sufficiently large n G N, and , 
there exists a random X variable over {0, 1}" such that > (1 — e)n 

with respect to width-S read once oblivious branching programs, but < 

polylog(n, S) with respect to width-4 oblivious branching programs. 

Theorem 3 follows from the following two lemmas, whose proofs appear in 
the full version: 

Lemma 4 (Based on [28]). Let e > 0 be some constant and S' € N such that 
S > A Let I = ^ log S and consider the distribution X = {Ui, Ui , . . . , Ui) over 
{0,1}" for some n < S which is a multiple of 1. Then, > (1 — e)n 

with respect to width-S oblivious branching programs. 

Lemma 5. Let e > 0 be some constant, and X be the random variable {Ui,Ui, 

. . . ,Ui) over jo, 1}" (where I > logn}. Then, with respect 

to width-4 oblivious branching programs. 



7 Analogs of Information-Theoretic Inequalities 

7.1 Concatenation Lemma 

A basic fact in information theory is that for every (possibly correlated) random 
variables X and Y, the entropy of {X, T) is at least as large as the entropy of 
X. We show that if one- way- functions exist then this does not hold for all types 
of pseudoentropy with respect to polynomial time circuits. On the other hand, 
we show that the fact above does hold for polynomial-sized PH-circuits and for 
bounded-width oblivious branching programs.^® 

With respect to the latter, we only prove that concatenation holds for metric-type 
pseudoentropy. 
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Negative result for standard model. Our negative result is the following easy 
lemma, whose proof is omitted: 

Lemma 6. Let G : {0, 1}* ^ {0, 1}” he a (poly-time computable) pseudorandom 
generator. Let (X, Y) be the random variables (G(JJi), Ui). Then = n 

(for a negligible e) but Y) <l -\-l. 



Positive result for PH-circuits. Our positive result for PH-circuits is stated in 
the following lemma, whose proof appears in the full version: 

Lemma 7. Let X he a random variable over {0, 1}" and Y be a random variable 
over {O,!}*”. Suppose that Hf'^°{X) > k with respect to s-sized PH-circuits. 
Then HJ‘^°{X,Y) > k with respect to 0{s)-sized PH-circuits. 

Applying the results of Section 5.2, we obtain that with respect to PH- 
circuit, the concatenation property is satisfied also for HILL-type and Metric- 
type pseudoentropy. 

Positive result for bounded-width oblivious branching programs. We also show 
that the concatenation property holds also for metric-type pseudoentropy with 
respect to bounded-width read-once oblivious branching programs. This is stated 
in Lemma 8, whose proof appears in the full version. Note that the quality of this 
statement depends on the order of the concatenation (i.e., whether we consider 
(X,Y) or (Y,X)). 

Lemma 8. Let X be a random variable over {0, 1}" and Y be a random variable 
over {O,!}*”. Suppose that > k with respect to width-S read-once 

oblivious branching programs. Then, F) > k and H^fg"°{Y,X) > 

k — log(l/e) with respect to such algorithms. 



7.2 Unpredictability and Entropy 

Loosely speaking, a random variable X over {0, 1}" is ^-unpredictable is for 
every index i, it is hard to predict Xi from Ap (which denotes Xi , . . . , Ai_i) 
with probability better than ^ + 6. 

Definition 5. Let X be a random variable over {0,1}". We say that X is S- 
unpredictable in index i with respect to a class of algorithms C if for every 
P G C, Pr[P(A[i_,;_i]) = Xi] < ^ -\- S. X is (5-unpredictable if for every P G C 
Pr[P(i, = Xi] < i -|- (5 where this probability is over the choice of X 

and over the choice of i <— [n] . We also define complement unpredictability by 
changing Aq to A[„]\^pj in the definition above. 



16 



We mean here a pseudorandom generator in the “cryptographic” sense of Blum, 
Micali and Yao [3,2]. That is, we require that G is polynomial time computable. 
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Yao’s Theorem [2] says that if X is (5-impredictable in all indices by poly- 
nomial-time (uniform or non-uniform) algorithms, then it is nJ-indistingiiishable 
from the uniform distribution. Note that this theorem can’t be used for a con- 
stant (5 > 0. This loss of a factor of n comes from the use of the “hybrid argu- 
ment” [1,2]. In contrast, in the context of information theory it is known that if 
a random variable X is J-unpredictable (w.r.t. to all possible algorithms) for a 
small constant S and for a constant fraction of the indices, then H°°{X) > fl{n). 
Thus, in this context it is possible to extract f2(n) bits of randomness even from 
^-unpredictable distributions where (5 is a constant [20]. 

In this section we consider the question of whether or not there exists a 
computational analog to this information-theoretic statement. 

Negative result in standard model. We observe that if one-way functions exist, 
then the distribution (G{Ui), Ui) where \G{Ui)\ = to{l)) used in Lemma 6 is also 
a counterexample (when considering polynomial-time distinguishers) . That is, 
this is a distribution that is (5-unpredictable for a negligible 6 in almost all the 
indices, but has low pseudoentropy. We do not know whether or not there exists 
a distribution that is ^-unpredictable for a constant S for all the indices, and 
has sublinear pseudoentropy. 



Positive results. We also show some computational settings in which the in- 
formation theoretic intuition does holds. We show this for PH-circuits, and for 
bounded-width oblivious branching programs using the metric definition of pseu- 
doentropy. We start by considering a special case in which the distinguisher has 
distinguishing probability I (or very close to 1).^^ 



Theorem 4. Let X he a random variable over {0,1}". Suppose there exists a 
size-s PH-circuit (width-S oblivious branching program) D such that jZI”^(l)j < 
2^ and Pr[ZI(Y) = 1] = 1. Then there exists a size-0{s) PH-circuit (width- 
S oblivious branching program) P such that Prig[„] a,^^x[-P(a^[i i]) = Xi] > 

i-o(^) 



The main step in the proof of Theorem 4 is the following lemma: 



Lemma 9. Let D C {0, 1}" he a set such that \D\ < 2^. Let x = Xi . . . Xi-i G 
{0,1}*“^, we define to be the number of continuations of x in D (i.e., = 

\{x' G {0, 1}"“* j xx' G D}\). We define P{x) as follows: P{x) = 1 if > | 
and P{x) = 1 if where P{x) is undefined otherwise. Then, for every 

random variable X such that X <Z D, 



Pl'zC [n] ,x*~ 



-rX 



P(x[i^i_i]) is defined and equal to Xi >1 — 0 



Proof. For x G {0, 1}", we let bad{x) C [n] denote the set of indices i G [n] 
such that P(x[i ,;_!]) is either undefined or different from Xi. We will prove the 

Intuitively, this corresponds to applications that use the high entoropy distribution 
for hitting a set (like a disperser) rather than for approximation of a set (like an 
extractor). 
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lemma by showing that \bad{x)\ < 0{k) for every string x G D. Note that 
an equivalent condition is that \D\ > Indeed, we will prove that 

\D\ > (1 + Let Ni denote the number of continuations of in D 

(i.e., Ni = .j). We define Nn = 1. We claim that for every i G bad{x), 

Ni-i > (1 + ^)Ni. (Note that this is sufficient to prove the lemma). Indeed, 
W-I = or in other words, W-i = Ni + (where 

1-Xi). Yet, if i € bad{x) then > ^{Ni + □ 

We obtain Theorem 4 from Lemma 9 for the case of PH-circuits by observing 
that deciding whether P{x) is equal to 1 or 0 (in the cases that it is defined) can 
be done in the polynomial-hierarchy (using approximate counting [29]). The case 
of bounded-width oblivious branching programs is obtained by observing that 
the state of the width-S* oblivious branching program D after seeing xi, . . . , Xi-\ 
completely determines the value P{x\, . . . , Xi-\) and so P{x\, . . . , Xi-i) can be 
computed (non-uniformly) from this state. 

We now consider the case that Pra,^j^x[a^ G D] = e for an arbitrary constant 
e (that may be smaller than i). In this case we are not able to use standard 
unpredictability and use complement unpredictability. 

Theorem 5. Suppose that X is S -complement-unpredictable for a random index 
with respect to s-sized PH-circuits, where | > 5 > 0 fs some constant. Let 
e > S be some constant, then iJ“‘’*‘'*°(Y) > I7(n) with respect to 0(s)-sized 
PPL-circuits. 

Proof. We prove the theorem by the contrapositive. Let e > 6 and suppose that 
iJ“®‘‘'‘°(Y) < k where k = e'n (for a constant e' > 0 that will be chosen later). 
This means that there exists a set G C such that Pra,^j^jy[a; G T>] > -^ -I- e. 
In particular, this means that \D\ < 2^ and Pr 2 ,^jj^jy [a: G U] > e. We consider 
the following predictor P'\ On input i G [n] and x = x\, . . . , Xi-i,Xi +\, . . . , G 
{0, 1}”“^, P' considers the strings x^ where = xi, . . . , Xi-\,b, x^+i, . . . , x„. 
If both x° and x^ are not in D, then P' outputs a random bit. If x^ G I? and 
x^ ^ D then P' outputs b. Otherwise (if x°, x^ G D), P' outputs P{xi , . . . , Xi_i), 
where P is the predictor constructed from D in the proof of Lemma 9. Let 
P{D) denote the set of all strings x such that x ^ D but x is of Hamming 
distance 1 from D (i.e., there is z G [n] such that xi, . . . , Xi-\,x^, x^+i, . . . , x„ G 
D). If S' C {O,!}", then let Yfs denote the random variable X\X G S. By 
Lemma 9 [-P'(a;[n]\{i}) = x^j > 1 - O(^) while it is clear that 

= i Thus if it holds that Pr[Y G 
P{D)] < e' and k < e'n, where e' is some small constant (depending on e and S) 
then Pi'jg[j] .„^jj^x[-P^(3^[ra]\{i}) = Xi] > ^ -\- 5 and the proof is finished. 

However, it may be the case that Pr[Y G P{D)] > e! . In this case, we 
will consider the distinguisher = D \J P{D), and use to obtain a 

Lemma 9 only gives a predictor given a distinguisher D such that G D] = 

1. However, the proof of Lemma 9 will still yield a predictor with constant bias even 
if 1 is replaced by ^ (or any constant greater than 4). 
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predictor in the same way we obtained P' from D. Note that < 

n\D\ and that, using non-determinism, the circuit size of is larger than 
the circuit size of D by at most a O(logn) additive factor.^® We will need to 
repeat this process for at most ^ steps, to obtain a distinguisher (where 
c < i) such that |Z)| < Pr[X e > e 

and Pr[X S < e'. The corresponding predictor will satisfy that 

=Xi]>\ + 5 thus proving the theorem. □ 

Acknowledgements We thank Oded Goldreich and the RANDOM 2003 
referees for helpful comments. 
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Abstract. We present upper bounds on the size of codes that are locally 
testable by querying only two input symbols. For linear codes, we show 
that any 2-locally testable code with minimal distance Sn over any finite 
field F cannot have more than codewords. This result holds even 

for testers with two-sided error. For general (non-linear) codes we obtain 
the exact same bounds on the code size as a function of the minimal 
distance, but our bounds apply only for binary alphabets and one-sided 
error testers (i.e. with perfect completeness) . Our bounds are obtained by 
examining the graph induced by the set of possible pairs of queries made 
by a codeword tester on a given code. We also demonstrate the tightness 
of our upper bounds and the essential role of certain parameters. 



1 Introduction 

Locally testable codes are error-correcting codes that admit very efficient code- 
word testers. Specifically, using a constant number of (random) queries, non- 
codewords are rejected with probability proportional to their distance from the 
code. 

Locally testable codes arise naturally from the study of probabilistically 
checkable proofs, and were explicitly defined in [5] and systematically studied in 
[7]. The task of testing a code locally may also be viewed as a special case of 
the general task of property testing initiated by [9,6], where the property being 
tested here is that of being a codeword. In this paper we explore codes that can 
be tested with constant number of queries. 

We focus on codes C C that have large distance (i.e., each pair of code- 
words differ in at least fl{n) coordinates) and large size (i.e., at the very least, 
jCj should grow with n and IT’D. Such codes are known to exist. Specifically, 
in [7] locally testable codes are shown such that jCj = \S\^ for k = We 

highlight two of these results: 
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1. For S = {0, 1}, three queries are shown to suffice. Furthermore, these codes 
are linear. 

2. For |Z’| > 2, two queries are shown to suffice."^ 

This raises the question of whether binary codes and/or linear codes can have 
codeword tests that make only two queries. In this paper, we show that the 
answer is essentially negative; that is, for codes of linear distance, such codes 
can contain only a constant number of codewords. More general statements are 
provided by Theorems 3.1 and 4.1, which address linear codes over arbitrary 
fields and non-linear binary codes, respectively. We also address the tightness of 
our upper-bounds and the essential role of certain parameters (i.e., our upper- 
bounds apply either to linear codes or to binary codes that have a tester of 
perfect completeness). 

Organization: In Section 2 we present the main definitions used in this paper, and 
state our main results. In Section 3 we study linear codes that admit two-query 
codeword testers. In Section 4 we study general binary codes that admit two- 
query codeword testers of perfect completeness. Due to space considerations, the 
rests of our results appear only in our technical report [3]: In [3, Sec. 5] we show 
that our upper-bounds cease to hold for ternary non-linear codes (rather than for 
non-linear codes over much larger alphabets as considered in [7] and mentioned 
in Item 2 above). In [3, Sec. 6] we show that perfect completeness is essential for 
the results regarding non-linear binary codes (presented in Section 4). 

2 Formal Setting 

We consider words over an alphabet S. For w € and i € [n], we denote by 
Wi the *-th symbol of w; that is, w = wi ■ ■ ■ Wn. 

2.1 Codes 

We consider codes C C if" over a finite size alphabet S. The blocklength of 
C is n, and the size of C is its cardinality \C\. We use normalized Hamming 
distance as our distance measure; that is, for u,v £ if" the distance A{u, v) is 
defined as the number of locations on which u and v differ, divided by n (i.e., 
Z\(m, v) = |{i : Ui 7 ^ Vi}\/n). The relative minimal distance of a code, denoted ^(C), 
is the minimal normalized Hamming distance between two distinct codewords. 
Formally 

5{C) = min {A{u,v)} 

u^vGC 

The distance of a word w from the code, denoted A{w,C), is min„ (zc{A{w,v)}. 

We comment that these codes are “linear” in a certain sense. Specifically, If is a 
vector space over a field F, and the code is a linear subspace over F (rather than 
over S). That is, ii S = F^ then C C If" is a linear subspace of (but not of 
If", no matter what finite field we associate with If). In the coding literature such 
codes are called F-linear. 
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A code is called redundant if its projection on some coordinate is constant (i.e., 
there exists i G {1, . . . ,n} such that for any two codewords w,w' it holds that 
Wi = w'l). A redundant code can be projected on all non-redimdant coordinates, 
yielding a code with the same size and distance, but smaller blocklength. Thus, 
w.l.o.g., we assume all codes to he non-redundant. 

Typically (in this paper) A is a finite field F and we view F" as a vec- 
tor space over F. In particular, for u,v G F" the inner product of the two 
is (u, It) = arithmetic operations are in F). The weight of 

V G F", denoted wt(t;), is the number of non-zero elements in v. In this case 
Z\(m, v) = wt(u — v)/n. 

2.2 Testers and Tests 

By a codeword tester (or simply tester) with query complexity g, completeness c 
and soundness s (for the code C C A") we mean a randomized oracle machine 
that given oracle access to w; G A" (viewed as a function w : {1, . . . , n} — > A) 
satisfies the following three conditions: 

— Query Complexity q: The tester makes at most q queries to w. 

— Completeness: For any w G C, given oracle access to w the tester accepts 
with probability at least c. 

— Soundness: For any w that is at relative distance at least 5{C)/3 from C, 
given oracle access to w, the tester accepts with probability at most 

If C has a codeword tester with query complexity q, completeness c and soundness 
s we say C is [q, c, s]-locally testable. 

A deterministic test (or simply test) with query complexity g is a deterministic 
oracle machine that given oracle access to w G A" makes at most g queries to 
rc, and outputs 1 (= accept) or 0 (= reject). Any (randomized) tester can be 
described as a distribution over deterministic tests, and we adopt this view 
throughout the text. 

A (deterministic) test is called adaptive if its queries depend on previous 
answers of the oracle, and otherwise it is called non-adaptive. A test has perfect 
completeness if it accepts all codewords. Both notions extend to (randomized) 
testers. Alternatively, we say that a tester is non-adaptive (resp., has perfect 
completeness) if all the deterministic tests that it uses are non-adaptive (resp., 
have perfect completeness resp.), and otherwise it is adaptive (resp., has non- 
perfect completeness). 

2.3 Our Results 

We study 2-query codeword testers. Our main results are upper-bounds on the 
sizes of linear (resp., binary) codes admitting such testers (resp., testers of perfect 
completeness): 

® We have set the detection radius of the tester at third its distance (i.e., for any w 
whose distance from C is at least | • S{C) the test rejects with probability at least 
s). As will be evident from the proofs, our results hold for any radius less than half 
the distance. 
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Theorem 2.1 For any constants c> s, any [2, c, s]-locally testable linear code 
over S has at most codewords, where 5 is its relative distance. 

Theorem 2.2 For any constant s < 1, any [2,1, s] -locally testable binary code 
has at most 2?!^ codewords, where 5 is its relative distance. 

In contrast to the above, we state the following facts: 

1. The upper-bounds stated in Theorems 2.1 and 2.2 are reasonablly tight: For 

some constants s < 1 and (5 > 0, and every finite field F, there exists a linear 
[2, 1, s]-locally testable code of size aird minimal relative distairce 5 

over F (see, Propositioir 3.6). 

2. Non-linearity of the code is essential to Theorem 2.1 aird binary alphabet is 
essesntial to Theorem 2.2: there exists good non-linear codes over ternary 
alphabets that have 2-query codeword testers (of perfect completeness). That 
is, for some constants s < 1 and <5 > 0, there exists a [2, 1, s]-locally testable 
ternary code of relative distance 5 that has size that grows almost linearly 
with the blocklength (see [3, Thm. 5.6]). 

3. Perfect completeness is essesntial to Theorem 2.2: there exists good non- 
linear codes over binary alphabets that have 2-qriery codeword testers of 
non-perfect completeness. That is, for some constants c > s > 0 aird <5 > 0, 
there exists a [2, c, s]-locally testable binary code of relative distance <5 that 
has size that grows almost linearly with the blocklength (see [3, Thm. 6.1]). 

4. Regarding the difference between linearity and “semi-linearity” (as in Foot- 
note 1), we note that there exists good GF{2)-linear codes over {0, 1}^ that 
have 2-query codeword testers (of perfect completeness): (see [3, Thm. 5.7]). 

We mention that some of our results are analogous to results regarding proba- 
bilistic checkable proof (PCP) systems. In particular, let VCV^ ,^[log,q\ denote 
the class of languages having PCP systems with logarithmic randomness, making 
q queries to oracles over the alphabet S, and having completeness and sound- 
ness bounds c and s respectively. Then, it is known that ^og,2] = V 

for every s < 1, whereas [log, 2] = NV for some c > s > 0 and 

\k>g, 2] = NV for some s < 1.® Folllowing [7], we warn that the trans- 
latioir betweeir PCPs and locally-checkable codes is not obvious. Iir particular, 
we do not know whether it is possible to obtain our coding results from the 
knowir PCP results or the other way around. 

3 Linear Codes 

In this section we show that [2, c, s]-locally testable linear codes with constairt 
minimal relative distance must have very small size. Throughout this section F 
is a finite field of size ]Fj. A code C C F" is called linear if it is a linear subspace 
of F". The main result of this section is the followiirg. 



The first two results are proven in [2], whereas the third result is lolklore that is 
based on the NP-Hardness of approximating MaxSSAT as established in [1]. 
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Theorem 3.1 (Theorem 2.1, restated): Let C C F" be a [2,c,s]-loeally testable 
linear eode with minimal relative distance S. If c > s then 

\C\ < IFI3/-5 

We start by pointing out that, when considering testers for linear codes, the 
tester can be assumed to be non-adaptive and with perfect completeness. This 
holds by the following result of [4]. 

Theorem 3.2 [4]: If a linear code (over any finite field) is [q, c, s]-locally testable 
using an adaptive tester, then it is [q, 1, 1 — (c— s)]-locally testable using a non- 
adaptive tester. 

Notice that if we start off with a tester having completeness greater than sound- 
ness (c > s), then the resulting non-adaptive, perfect-completeness tester (guar- 
anteed by Theorem 3.2) will have soundness strictly less than 1. Thus, in order 
to prove Theorem 3.1 it suffices to show the following. 

Theorem 3.3 Let C C be a [2,1, s]-locally (non-adaptively) testable linear 
code, with s < 1, and let the minimal relative distance be S. Then: 

\C\ < IFI3/-5 

In the rest of the section we prove Theorem 3.3. The proof idea is as follows. 
Each possible test of query complexity 2 and perfect completeness imposes a 
constraint on the code, because all codewords must pass the test. Thus, we 
view the n codeword coordinates as variables and the set of tests as inducing 
constraints on these variables (i.e., codewords correspond to assignments (to 
the variables) that satisfy all these constraints). Since the code is linear, each 
test imposes a linear constraint on the pair of variables queried by it. (A linear 
constraint on the variables x, y has the form ax-\-by = Q for some fixed a, 6 € F). 
We will show that in a code of large distance, these constraints induce very few 
satisfying assignments. Specifically, we look at the graph in which the vertices 
are the (n) codeword-coordinates (or variables) and edges connect two vertices 
that share a test. The main observation is that in any codeword, the values of 
all variables in a connected component are determined by the value of any one 
variable in the component; that is, the assignment to a single variable determines 
the assignment to the whole component. By perfect completeness, any word that 
satisfies all constraints in all connected components will pass all tests. Hence 
there cannot be many variables in small connected components, for then we 
could find a word that is far from the code and yet is accepted with probability 
1. But this means that the code is essentially determined by the (small number 
of) large connected components, and hence the size of the code is small. We now 
give the details, starting with a brief discussion of dual codes which is followed 
by the proof. 
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3.1 Linearity Tests and Dual Codes 

Recall that C C F" is linear iff for all u,v G C we have u + v G C. In this case 
S(C) = minu,gc{wt(w;)/n}. As pointed out in [8], codeword tests for linear codes 
are intimately related to the “dual” of the code. For a linear code C, the dual 
code C-*- is defined as the subspace of F" orthogonal to C, i.e. 

C'*' = {f : _L C} 

where u _L C iff for all u S C, u _L u (recall v J- u iS {v, u) = 0). 

The support of a vector v, denoted Supp(?;), is the set of indices of non- 
zero entries. Similarly, the support of a test T is the set of indices it queries. 
Notice that a non-adaptive test with query complexity q has support size q. For 

v, u g¥'^ we say that v covers u if Supp(r;) A Supp(u). A test is called trivial if 
it always accepts. Elementary linear algebra gives the following claim. 

Proposition 3.4 The support of any non-trivial perfect-completeness test for 
C eovers an element of \ {0"}. 

Proof: Let T be a test and Ct be the projection of (the linear space) C onto 

Supp(T). The projection is a linear operator, so Ct is a linear space over F. The 
linear space Ct must be a strict subspace of because \Ct\ = 

(i.e. Ct includes all vectors in implies that either T reject some valid 

codeword in C (in violation of perfect completeness) or T always accepts (in 
violation of non-triviality). It follows that (Ct)'^ has a non-zero element, denoted 

w. However, Supp(tc) C Supp(T) and w G C^, completing the proof. □ 

Clearly one can assume that all tests used by a tester are non-trivial. We also 

assume has no element of weight 1, because otherwise C is redundant. Since 
we consider only testers that make two queries, it follows that all tests they use 
have support size exactly two. Furthermore, without loss of generality, all the 
tests are linear.^ 

3.2 Upper Bounds on Code Size 

By the above discussion (i.e., end of Section 3.1), we may assume (w.l.o.g.) that 
the [2, 1, s]-tester for C is described by a distribution over 

C^ {v G C~‘~ : wt(v) = 2} 

The test corresponding to S refers to the orthogonality of v and the oracle 
w; that is, the test accepts w; if _L w; and rejects otherwise.® We now look at 
Cf- and bound the size of (Cf-)-^. Our theorem will follow because C C . 

^ In genenral, without loss of generality, a one-sided tester for a property P accepts y 
if and only if its view of y is consistent with its view of some x G P. In our case P 
is a linear space, so consistecy means satisfying a linear system. For further details 
see Appendix. 

® Notice that since wt(u) = 2 such a test amounts to two queries into w. 
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The set gives rise to a natural graph, denoted Gc- The vertex set of 
Gc is V(Gc) = n} and (i,j) G E{Ge) iff there exists Vij G with 

Supp(i;y) = 

The key observation is that, for any edge (i, j) G E{Gc) there is some G 
F \ {0} such that for any tc G C it holds that Wi = Cij ■ Wj. To see this, notice 
the constraint corresponding to (i, j) can be written as aijWi + bijWj = 0, where 
ciij, bij G F\{0} (if either or bij are 0 then Vij has support size one, meaning C 
is redundant). So, by transitivity, the value of w on all variables in the connected 
component of i, is determined by Wi. (Moreover, all these values are non-zero iff 
Wi yf 0.) Assuming that the number of connected components is k, this implies 
that there can be at most |F|^ different codewords (because there are only k 
degrees of freedom corresponding to the settings (of all variables) in each of the 
k components). To derive the desired bound we partition the components into 
big and small ones, and bound the number of codewords as a function of the 
number of big components (while showing that the small components do not 
matter) . 

Let C'i,...,C'fc be the connected components of Gc- We call a component 
small if its cardinality is less than 5n/3. Without loss of generality, let Ci, . . . Gs 
be all the small components, and let S = Ui=i denote their union. 

Claim 3.5 IS*] < 2<5n/3. 

Proof: Otherwise there exists I C {1, . . . , s} such that 

Sn/3 < \Ci\ < 2(5n/3 

iei 

For every i G I, we consider a vector w* G (C^)"*' with Supp(w;®) = Ci- To 
see that such a vector exists, set an arbitrary coordinate of Gi to 1 (which is 
possible because the code is not redundent) and force non-zero values to all 
other coordinates in Gi (by virtue of the above discussion). Furthermore, note 
that this leaves all coordinates out of Gi unset, and that the resulting w’’ satisfy 
all tests in (where the tests that correspond to the edges in Ci are satified by 
our setting of the non-zero values, whereas all other tests refer to vertices out of 
Ci and are satisfied by zero values). Now, define w = definition, we 

have Supp(w;) = Uig/C*, and 6n/3 < wt(w) < 2Snf3 follows by the hypothesis. 
Hence, A{w,C) > 5/3. 

On the other hand, w is orthogonal to C^- To see this, consider any v G C^. 
If Supp(w) C Ci, for some i € I, then the “view v has of w” (i.e. the values of 
the coordinates v queries) is identical to the view v has of the codeword w®, and 
so {v,w) = (v,w'‘) = 0. Otherwise (i.e., Supp(r;) has empty intersection with 
S), by definition v “sees” only zeros, and so {v,w) = 0. 

We conclude w is f-far from C, yet it passes all possible tests of query com- 
plexity two. This contradicts the soundness condition, and the claim follows. 
□ 

Proof (of Theorem 3.3): Assume for the sake of contradiction that 

\C\ > IFI3/-5 
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Recall that (by the “key observation” ) the values of all variables in a connected 
component are determined by the value of a single variable in this component. 
Since there are at most 3/<5 large connected components in Gc (because each has 
cardinality at least <5n/3), the contradiction hypothesis implies that there exist 
two codewords x ^ y that agree on all variables that reside in the large connected 
components. Indeed, these two codewords x ^ y, may differ on variables that 
reside in the small connected components (i.e., variables in S), but Claim 3.5 says 
that there are few such variables (i.e.. IS”! < 26n/3). By linearity x — y € C (but 
X — y yf 0"), and so 0 < wt(x — y) < < 5n. We have reached a contradiction 

(because C has distance (5), and Theorem 3.3 follows. I 



3.3 Tightness of the Upper Bound 

We remark that our upper bound is quite tight. For any ^ < 1, consider the 
following code C F" formed by taking 1 /6 elements of F and repeating each 
one of them 5n times. Thus, a codeword in is formed of 1/^ blocks, each block 
of the form for some e S F (here means k repetitions of e). 

Proposition 3.6 C„ is a linear [2, 1,1— testable code with minimal 

relative distance 5 and size IFI^/"^. 

For instance, taking F = GF{2), the soundness parameter in the proposition is 
1 -(5/3. 

Proof: The linearity, distance and size of are self-evident. Consider the 

following natural tester for C„: Select a random block, read two random elements 
in it, and accept iff the two are equal. This tester has perfect completeness and 
query complexity 2. As to the soundness, let k = 1/5 and write v G F" as 
(x^^\ . . . where is the i-th block of v (i.e., = 5n). The Hamming 

distance of v from Cn is the sum of the Hamming distances of the individual 
blocks from the code B = {e'^" : e G F}. 

Suppose V has relative distance at least 5/3 from C„. Let 5i denote the relative 
distance of from B. Then, i ^ (and 5i < 1 — ||ri). The acceptance 

probability of the tester equals 

i^(5/ + (l-50^) = l-l^(l-5,)-5, 

where the first inequality is due to <5^ < 1 — |^. Thus, the soundness parameter 
is as claimed. □ 
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4 Non-linear Codes 

In this section we provide upper bounds on the code size of arbitrary (i.e., possi- 
bly non-linear) 2-locally testable codes. Our bounds apply only to binary codes 
and testers with perfect completeness, and with good reason: There exist good 
2-testable binary codes with non-perfect completeness (see [3, Sec. 6]) and there 
exist good 2-testable codes with perfect completeness over ternary alphabets 
(see [3, Sec. 5]). Our main result is: 

Theorem 4.1 (Theorem 2.2, restated): IfC C {0, 1}" is a [2, 1, s]-locally testable 
code with minimal relative distance S and s < 1, then 

\C\ < 2^/^ 

The proof (presented below) generalizes that of the binary linear case (binary 
means F = GF{2)), with some necessary modifications, which we briefly outline 
now. In the binary linear case a test querying Xi and Xj forces Xi = Xj for all 
codewords (this is the only possible linear constraint of size two over GF{2)). In 
that case, the set of all tests corresponds to an undirected graph in which each 
connected component forces all variables to have the same value. In the non- 
linear case a test (adaptive or non-adaptive) corresponds to a 2-CNF. (Recall 
that in both cases we deal with perfect completeness testers.) The set of all 
tests (which is itself a 2-CNF) corresponds to a directed graph of constraints 
on codewords, where the constraint Xi V Xj translates to the pair of directed 
edges Xi Xj and xj — > Xi. In the resulting directed graph, a strongly connected 
component takes the role played by the connected component in the linear case. 
Namely, for any codeword, all variables in a strongly connected component are 
fixed by the value of a single variable in the component. As in the linear case, we 
use the properties of the code and its tester (i.e., the code’s large distance and 
the fact that the tester rejects any word that is far from the code with non-zero 
probability) to show that the weight of the small strongly connected components 
is small. Hence, the code is determined by a small number of large connected 
components. 



Proof of Theorem 4.1 

Again, we view the n codeword coordinates as variables and the set of tests 
(which are 2-CNFs) as inducing constraints on these variables. We stress that 
each test (even an adaptive one) can be represented by a 2-CNF.® Let F be 
the conjunction of all non-trivial deterministic tests that are used by a 2-query 
tester that has perfect completeness with respect to C . We look at the satisfying 
assignments of F, and use this to bound the size of C. li F includes a clause of 

® In general, an adaptive test querying k variables is a decision tree of depth k. It is 
easy to verify that (the function computed by) such a tree can be represented both 
as a fc-CNF and as a fc-DNF. 
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size 1 then C is redundant. Thus, assuming non-redundancy of C implies that T 
can be represented by a 2-CNF in which each clause has exactly two literals. 

We examine the following directed graph Gjr. The vertex set of Gjf is the 
set of literals {x\,xi . . . For each clause V £') G .F we introduce in 

Gjr one directed edge from £ to and one from I' to We use the notation 
I I' to indicate the existence of a directed path from I to £' in Gjr. We use 
the notation w(£) to denote the value of literal £ under assignment w to the 
underlying variables. Identifying True with 1 and False with 0, we have 

Claim 4.2 (folklore): The following two conditions are equivalent 

1. The assignment w satisfies T . 

2. For every directed edge £ -^ £' it holds that w{£) < w{£'). 

A strongly connected component in a directed graph G is a maximal set of vertices 
G C V{G) such that for any v,v' € G it holds that v v' . For two strongly 

connected components G and G' in G, we say G G' iff there exist v G C and 
v' G G' such that v v'. (Indeed, this happens iff for all v G G,v' G C' it holds 
that V v' .) 

By Claim 4.2, w satisfies all constraints corresponding to edges of a strongly 
connected component G iff w{t) = w{£') for all £, £' G G. So, any satisfying 
assignment w either sets to 1 all literals in G, or sets them all to 0. In the first 
case we say that w{G) = 1 and in the latter we say w{C) = 0. 

Let L be the set of literals belonging to large strongly-connected components, 
where a component is called large iff its cardinality is at least Sn/3. Consider an 
arbitrary assignment p' to the variables of L that can be extended to a satisfying 
assignment (to F). In particular, p' does not falsify any clause of F (i.e., no clause 
of F is set to 0 by p'). A literal £ ^ L is said to be forced by p' if there exists 
£' G L such that £! ^ £ and p'(£') = 1. This is because any satisfying assignment 
to F that extends p' must set £ to 1 (since for such an assignment p it must 
holds that p(£) > p(£') = 1. Indeed, the complementary literal (i.e., t) is forced 
to 0. Let p be the closure of p' obtained by (iteratively) fixing all forced literals 
to the value 1 (and their complementary literals to 0). By definition, p does not 
falsify F . Let Sp be the set of unfixed variables under p. 

Claim 4.3 For any closure p of an assignment that satisfies L, it holds that 
|5'p| < 2Sn/S. 

Proof: Otherwise, let Gi , . . . , Gfc be a topological ordering of the unfixed 

strongly connected components comprising Sp, where the ordering is according 
to'^ (as defined above). (Indeed, the digraph defined on the Cfs by is acyclic.) 
For j = 0, . . . ,k, let v^^^ be the assignment extending p defined by: 



By Claim 4.2, each assignment v^^'> satisfies F . Since C is 2-locally testable with 
soundness s < 1, each word that is at distance at least <5/3 from C must falsify 
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some clause in T . But since satisfies IF, it must be that is within dis- 
tance 5/3 from some codeword, denoted By the constradiction hypothesis, 
we have A(v^^\ = \Sp\/n > 25/3, which implies (because 

< A{v^^\w^^'^) + A{w^'^\w^^^) + A{w^^\v^^^), which is upper- 
bounded by 2 • (5/3) -I- A{w'^'^\w^^'^)). It follows that 

/!(?;('=), > Z\(w;('=),w;(°)) - A{w^^\v^^^) > 5 - (5/3) = 25/3 

On the other hand, recall that A{v'^^\w^^'>) < 5/3. Since, for each j, it holds 
that < S/3 (because \Cj \ < 5n/3), there must be / S {0, 1, . . . , k} 

such that 5/3 < A{v''^\w^^^) < 25/3. For this /, it holds that A{v^^\C) > 5/3. 
But satisfies J- and so it is accepted by the tester with probability 1, in 
contradiction to the soundness condition. □ 

Our proof is nearly complete. As in the proof of Theorem 3.3, assume for the 
sake of contradiction that 

\C\ > 2-^/3 

In this case, there must be two distinct codewords w ^ u that agree on all 
large connected components. Let p' be the restriction of w to the variables of 
the large connected components. That is, p' agrees with w and with u on the 
assignment to all variables in L and is unfixed otherwise. Let p be the closure of 
p' (obtained by forcing as above). Note that w and u are satisfying assignments 
to T that agree on p' , so they also must agree on p (which is forced by p'). Thus, 
by Claim 4.3 

0 < A(u,w) < \Sp\/n < 5 

This contradicts the hypothesis that the minimal distance of C is 5, and the 
theorem follows. I 
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Appendix: A General Proposition Regarding Property 
Testing 

In Section 3.1, we used the fact that, without loss of generality, a perfect- 
completeness codeword-tester for a linear code makes only linear tests. This 
fact is a special case of the following general (folklore) proposition: 

Proposition A.l Let M be an oracle machine for the promise problem (JTyeS) 
TTno) such that for every x € JIyes holds that Pr[M“ = 1] = 1 (i.e., M has 
perfect completeness) . Then, modifying M such that it outputs 1 if and only if 
its view is consistent with some x' S Uyes fJfffD only improve its performance. 
That is, denoting the modified machine by M , we have Pr[M^ = 1] = 1 for 
every x G LTyes ond Pr[M“ = 1] < Pr[M^ = 1] for every x. 

In our case, the property being tested is belonging to a certain linear subspace, 
and thus in our case consistecy (among two answers) means satisfying a linear 
condition. 

Proof: Let us fix a contents r to the random-tape of M, and denote by view^(r) 
the view of machine M on random-tape r and access to oracle x. Then, machine 
M accepts on random-tape r and access to oracle x if and only if view^(r) 
equals view^(r) for some x' G JIyes (where the condition may be determined by 
scanning all x' G TJyes £^nd computing the corresponding view^(r)’s). Clearly, 
Pr[M^ = 1] = 1 for every x G TJyes (by considering x' = x). On the other 
hand, for every x and r, if M^(r) ^ 1 then by the one-sided feature of M it 
must be that view^(r) differs from view^(r) for all x' G TJyes- It follows that 
M^{r) 1 too. Thus, Pr[M“ 7 ^ 1] > Pr[M^ ^ 1], and the proposition follows. 

□ 
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Abstract. We study the Lovasz number 1 } along with two further SDP 
relaxations i?i/ 2 , '&2 of the independence number and the corresponding 
relaxations •&, i?i/ 2 , 1^2 of the chromatic number on random graphs Gn,p. 
We prove that nl, in the case p < are concentrated 

in intervals of constant length. Moreover, we estimate the probable value 
of ■!?, &{Gn,p) etc. for essentially the entire range of edge probabilities p. 
As applications, we give improved algorithms for approximating a{Gn,p) 
and for deciding fc-colorability in polynomial expected time. 



1 Introduction and Results 

Given a graph G = {V, E), let a{G) be the independence number, let uj{G) be the 
clique number, and let x(G') be the chromatic number of G. Further, let Q signify 
the complement of G. Since it is NP-hard to compute any of a{G), uj(G) or x(G), 
it is remarkable that there exists an efficiently computable function i9(G) that 
is “sandwiched” between a(G) and x(G), i.e. a(G) < i9(G) < x(G)- Passing to 
complements, and letting ^?(G) = i?(G), we have w(G) < 'd(G) < x(C)- The 
function i9 was introduced by Lovasz, and is called the Lovasz number of G 
(cf. [16,21]). 

Though i?(G) is sandwiched between a{G) and x(G')j Feige [7] proved that 
the gap between a(G) and i?(G) or between x{G) and i?(G) can be as large 
as e > 0. Indeed, unless NP=coRP, none of a(G), uj{G), x(G') can be 

approximated within a factor of n^~^, e > 0, in polynomial time [17,9]. However, 
though there exist graphs G such that d(G) is not a good approximation of a(G) 
(or i?(G) of x(G)), it might be the case that the Lovasz number performs well 
on “average” instances. In fact, several algorithms for random and semirandom 
graph problems are based on computing d [4,5,8]. Therefore, the aim of this 
paper is to study the Lovasz number of random graphs more thoroughly. 

The standard model of a random graph is the binomial model Gn,p, pio- 
neered by Erdos and Renyi. We let 0 < p = p{n) < 1 be a number that may 
depend on n. Let V = {1, . . . ,n}. Then the random graph Gn,p is obtained by 
including each of the ( 2 ) possible edges {?;,w}, v,w G V, with probability p 
independently. Though Gn,p may fail to model some types of input instances 
appropriately, both the combinatorial structure and the algorithmic theory of 

* supported by the Deutsche Forschungsgemeinschaft (grant DFG FOR 413/1-1). 

S. Arora et al. (Eds.): APPROX 2003+RANDOM 2003, LNCS 2764, pp. 228-239, 2003. 
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Gn,p are of fundamental interest [18,12]. We say that Gn,p has some property A 
with high probability (whp.), if lim„_,oo P(Gn,p has property A) = 1. 

We also address two further SDP relaxations 'di/ 2 , '&2 of a (cf. [27]) on random 
graphs. These relaxations satisfy a(G) < 'diy2(G) < 'd(G) < ■d2(G) < x(G), for 
all G. Passing to complements, and setting 'di(G) = i?i(G) {i = 1/2,2), one gets 
cj(G) < ■di/2(G) < 'd(G) < i? 2(G) < x(G). The relaxation i?i/2(G) coincides 
with the well-known vector chromatic number x(G) of Karger, Motwani, and 
Sudan [20]. 



The Concentration of i?i/2 5 '*^2- A remarkable fact concerning the chro- 
matic number of sparse random graphs Gn,p, P < is that x(G„_p) is 

concentrated in an interval of constant length. Indeed, Shamir and Spencer [26] 
proved that there is a function u = u{n,p) such that in the case p = n“'®, 
l/2</3<l, we have P(u < x{Gn,p) < m-I- \{2f3 + \) / (2/3— 1)]) = 1 — o(l). Fur- 
thermore, Luczak [25] showed that in the case 5/6 < /3 < 1, the chromatic num- 
ber is concentrated in width one. In fact, Alon and Krivelevich [2] could prove 
that two point concentration holds for the entire range p = 1/2 < /3 < 1. 

The two following theorems state similar results as given by Shamir and Spencer 
and by Luczak for the relaxations i?i/2(G„^p), d{Gn,p), and d2{Gn,p) of the chro- 
matic number. 

Theorem 1 . Suppose that Ci^jn <p < n~^ for some large constant cq > 0 and 
some number 1/2 < /3 < 1. Then ^i/ 2 {Gn,p), '&{Gn,p), '& 2 {Gn,p) are concentrated 
in width s = 2 p-i + there exist numbers u, u' , u” depending on n 

and p such that whp. u < ^\/2{Gn,p) < u + s, u' < d{Gn,p) < u' + s, and 
u” < ^?l/2(Gn,p) < u” -\- s. 



Theorem 2 . Suppose that co/n < p < n ^ for some large constant cq and 
some (5 > 0. Then ^i/2{Gn,p), d{Gn,p), and d2{Gn,p) are concentrated in width 1. 

In contrast to the chromatic number, 'di^ 2 ! tt, and '&2 need not be integral. 
Therefore, the above results do not imply that ^i/ 2 {Gn,p), ^{Gn,p), ^ 2 {Gn,p) are 
concentrated on a constant number of points. 



The Probable Value of 'd{Gn,p), '&{Gn,p)i etc. Concerning the proba- 
ble value of ’d{Gn,p) and i){Gn,p)^ Juhasz [19] gave the following partial an- 
swer: If ln(n)®/n p < 1/2, then with high probability we have d{Gn,p) = 
0{yjn!p) and i?(G„_p) = 0(^/np). However, we shall indicate in Sec. 4 that 
Juhasz’s proof fails in the case of sparse random graphs (e.g. np = 0(1)). Mak- 
ing use of concentration results on d etc., we can compute the probable value 
not only of d{Gn,p) and d{Gn,p), but also of ’di{Gn,p) and di{Gn,p), i = 1/2,2, 
for essentially the entire range of edge probabilities p. 
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Theorem 3. Suppose that co/n < p < 1/2 for some large constant cq > 0. 
Then there exist constants ci,C2,C3,C4 > 0 such that 

Ciyjn/p < ’di/ 2 {Gn,p) < ’&{Gn,p) < '&2{Gn,p) < C2\fnfp (1) 

and c^-^/np ^ Si ^(G^n,p) Si ^ c- 4 .yfnp 

with high probability. More precisely, 

P(C3V^ ^ ^i/ 2 {Gn,p) < ^(Gn,p) < ^ 2 {Gn,p)) >1~ exp(-n). (2) 

Assume that co/n < p = o(l). Then a{Gn,p) ~ 2ln{np)/p and x{Gn,p) ~ 
np/(21n(np)) whp. (cf. [18]). Hence, Thm. 3 shows that d 2 lGn,p) (^?i/2(G„,p)) 
approximates a{Gn,p) {x{Gn,p)) within a factor of 0{^/np). In fact, if np = 
0(1), then we get a constant factor approximation. Our estimate on the vector 
chromatic number "di/ 2 {Gn,p) answers a question of Krivelevich [22]. 

Finally, consider the random regular graph Gn,r- The proof of the following 
theorem is somewhat technical, and is omitted. 

Theorem 4. Let cq be a sufficiently large constant, and let cq < r = o(n^/^). 
There are constants ci,C2 > 0 such that whp. the random regular graph Gn,r 
satisfies c\n/^ < 'dij 2 {Gn,r) Si ’d{Gn,r) < d 2 {Gn,r) < C 2 n/y/r. Moreover, 
there is a constant C3 > 0 such that in the case cq < r = njg have 

P(C3V^ < ■di/2(G„,,.) < ii{Gn,r) < ^ 2 {Gn,r)) >1~ exp(-n). 

Algorithmic Applications. There are two types of algorithms for NP-hard 
random graph problems. First, there are heuristics that always run in polynomial 
time, and almost always output a good solution. On the other hand, there are 
algorithms that guarantee some approximation ratio on any input instance, and 
which have a polynomial expected running time when applied to Gn,p- In this 
paper, we deal with algorithms with a polynomial expected running time. 

First, we consider the maximum independent set problem in random graphs. 
Krivelevich and Vu [23] gave an algorithm that in the case p ^ approxi- 

mates the independence number of Gn,p in polynomial expected time within a 
factor of 0{y/np/ In(np)). Moreover, they ask whether a similar algorithm exists 
for smaller values of p. As a first answer, Coja-Oghlan and Taraz [4], gave an 
0(y^rip/ ln(np))-approximative algorithm for the case p 3> ln(n)®/n. 

Theorem 5. Suppose that co/n < p Si 1/2. There is an algorithm ApproxMIS 
that for any input graph G outputs an independent set of size at least 

-y Tip 

and which applied to Gn,p runs in polynomial expected time. Here cq,ci > 0 
denote constants. 

As a second application, we give an algorithm for deciding within polyno- 
mial expected time whether the input graph is fc-colorable. Instead of Gn,p, we 
shall even consider the semirandom model that allows for an adversary to 
add edges to the random graph. We say that the expected running time of an 
algorithm A is polynomial over Gj/^, if there is some constant I such that the 
expected running time of A is 0(n*) regardless of the behavior of the adversary. 



The Lovasz Number of Random Graphs 



231 



Theorem 6. Suppose that k = o(-y/n), and that p > cok^/n, for some constant 
Co > 0. There exists an algorithm Decidek that for any input graph G decides 
whether G is k-colorable, and that applied to has a polynomial expected 

running time. 

The algorithm Decide^ is essentially identical with Krivelevich’s algorithm 
for deciding fc-colorability in polynomial expected time [22] . However, the analy- 
sis given in [22] requires that np > exp(l7(fc)). The improvement results from the 
fact that the analysis given in this paper relies on the asymptotics for i?i/ 2 (G„_p) 
derived in Thm. 3 (instead of the concept of semi-colorings). Finally, we mention 
that our algorithm Decidefc also applies to random regular graphs Gn,r- 

Theorem 7. Suppose that cok^ < r = for some constant cq > 0. Then, 

applied to Gn,n the algorithm Decidek has polynomial expected running time. 

Notation. Throughout we let V = {1, . . . , n}. If G = (V, E) is a graph, then 
A{G) is the adjacency matrix of G. By 1 we denote the vector with all entries 
= 1 , and J denotes a square matrix with all entries = 1 . If M is a real symmetric 
n X n-matrix, then Xi{M) > ■ ■ ■ Xn{M) signify the eigenvalues of M . 

2 Preliminaries 

Let G = {V, E) be a graph, let (r>i, . . . , Vn) be an n-tuple of unit vectors in R", 
and let fc > 1. Then {vi , . . . , is a vector k-coloring oi G if (vi,Vj) < — l/(fc— 1) 
for all edges {i,j} € E. Furthermore, {v\, . . . ,Vn) is a strict vector fc-coloring 
if {vi,Vj) = — l/(fc — 1) for all {i,j} G E. Finally, we say that (r>i, . . . ,w„) is a 
rigid vector fc-coloring if (vi,Vj) = —f/{k — 1) for all {i,j} G E and (vi,Vj) > 
— l/(fc — 1) for all {*, j} ^ E. Following [20,14,3], we define 

■'^i/ 2 (G) = infjfc > 1] G admits a vector fc-coloring}, 
i?(G) = i?i(G) = infjfc > 1] G admits a strict vector fc-coloring}, (3) 
^ 2 {G) = infjfc > 1] G admits a rigid vector fc-coloring}. 

Observe that i?i/ 2 (G) is precisely the vector chromatic number introduced by 
Karger, Motwani, and Sudan [20]; ^2 occurs in [14,27]. Further, we let i?i/ 2 (G) = 
■di/ 2 (G), i?(G) = i?i(G) = 'd(G), and i? 2 (G) = ^ 2 {G). It is shown in [20] that the 
above definition of d is equivalent with Lovasz’s original definition (cf. [16]). 

Proposition 8. Let G = {V, E) be a graph of order n, and let S C V. Let G[5'] 
denote the subgraph of G induced on S. Then di{G) < i?i(G[S']) -I- di{G[V \ S']). 

It is obvious from the definitions that for any weak subgraph H of G we have 
di{H) < di{G), i G {1/2, 1,2}. In addition to d, di/ 2 , and d 2 , we consider the 
semidefinite relaxation of MAX CUT invented by Goemans and Williamson [15]: 
SMC(G) = niax^-^^- (1 — {vi,Vj)) s.t. llriill = 1, where the max is taken over 

Vi,...,VnG R”. 
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Finally, we need the following concentration result on For i}, the 

proof can be found in [5]. Using suitable characterizations of '& 1 / 2 , 1 ^ 2 , the argu- 
ment given in [5] can be adapted to cover these cases as well. 

Theorem 9. Suppose that p < 0.99, and that n > no for a certain constant 
no > 0. Let m be a median of •d{Gn,p)- 

i. Let ^ > maxjlO, Then¥fd(Gn,p) > rn+^) < 30 exp(— ^^/(5m-|-10^)). 

ii. Let ^ > 10. T/ien P(i?(G„^p) < m — < 3 exp(— ^^/lOm). 

The same holds with i9 replaced by '&i ^2 or by i? 2 - 

3 The Concentration Results 

Proof of Thm. 1. Let p and f3 be as in Thm. 1. The proof is based on the 
following large deviation result, which is a consequence of Azuma’s inequality. 

Lemma 10. Suppose that X : Gn,p — »■ R is a random variable that satisfies the 
following conditions for all graphs G = (V, E). 

— For all V G V the following holds. Let G* = G + {{w,ri;}| w G V, w < v}, 
and let G^ = G — {{?;, rc}| w gV, w < i;}. Then |A(G*) — A(G*)| < 1. 

— If H is a weak subgraph of G, then X{H) < X{G). 

Then P(|X — E(A)| > t^/n) < 2exp(— 1^/2). 

Let uj = uj{n) be a sequence tending to infinity slowly, e.g. uj{n) = Inln(n). 
Furthermore, let k = k{n,p) = infja; > 0| P{d 2 {Gn,p) < x) > For any 

graph G = (U, E) let F(G) = min{#[/| U CV, ^G-U) < k}. Then ^G) < k 
if and only if T(G) = 0. Hence, P(T = 0) > uj~^. Moreover, by Prop. 8, the 
random variable Y satisfies the assumptions of L. 10. Let p = E(T). Then 
P < y/nui. Thus, by L. 10, Y < 2^/nio with high probability. The following 
lemma is implicit in [26] (cf. the proof of L. 8 in [26]). 

Lemma 11. Let <5 > 0. Whp. the random graph G = Gn,p enjoys the following 
property. IfUcV, ffU < 2^Jnuj then x{G[U]) < s, where s > 2 p-\ + 

To conclude the proof of Thm. 1, let G = G„^p, and suppose that there is 
some U CV, ffU < 2y^w, such that i? 2 (G — U) < k < d 2 {G). Since by L. 11 
i^2(G[t7]) < x(G[t/]) < s whp.. Prop. 8 entails that k < ^ 2 {G) < k + s whp. 



Proof of Thm. 2. Let w be a sequence tending to infinity slowly. The random 
graph G = Gn,p admits no C/ C U, ffU < tu^s/n, spanning more than 3(#C/ — 
e)/2 edges whp., where e > 0 is a small constant. Let k be as in the proof of 
Thm. 1. Then whp. there is a set G C U, #G < Lo^fn, such that ■d 2 (G— U) < k. 
Following Luczak [25], we let U = Uq, and construct a sequence Uo,...,Um 
as follows. If there is no edge {v,w} G E with v,w G N{Ui) \ Ui, then we let 
m = i and finish. Otherwise, we let U {v, rc} and continue. Then m < 
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mo = because otherwise #C/mo = (2 + and ^E{G[Umo]) — 

3(l-o(l))#C/„„/2. Let i?=C7™. 

By L. 11, r? 2 (G[i?]) < x(G'[i?]) < 3. Furthermore, / = N(R) \ R is an inde- 
pendent set. Let Gi = G[R U I], S = V \ {R U /), and G 2 = G[S' U /]. Then 
■!^ 2 (G 2 ) < k, and ■d 2 (Gi) < 4. In order to prove that ^ 2 {G) < fc-|- 1, we shall first 
construct a rigid vector k + 1-coloring of G 2 that assigns the same vector to all 
vertices in I. Thus, let {xy)y^sui be a rigid vector fc-coloring of G 2 . Let a; be a 
unit vector perpendicular to Xy for all v G S. Moreover, let a = (/c^ — 1)”^/^, and 
set ?/„ = [p? + l)“^/^(a;„ — ax) for v £ S, and ?/„ = x for v £ I. Then {yv)veSui 
is a rigid vector (fc -I- l)-coloring of G 2 . In a similar manner, we can construct 
a rigid vector 4-coloring {y'y)vGRui of Gi that assigns the same vector x' to all 
vertices in I. 

Applying a suitable orthogonal transformation if necessary, we may assume 
that X = x'. Let I = max{4, k+ 1}. Since N{R) C i?U/, we obtain a rigid vector 
/-coloring of G, where Zy = yy if v £ S U I , and Zy = y'y if v £ R. By the 

lower bound on ^ 2 {Gn,p) in Thm. 3 (which does not rely on Thm. 2 of course), 
choosing cq large enough we may assume that fc > 4, whence k < i? 2 (G) < k + 1. 



4 The Probable Value of 'd(Gn^p), 'd(Gn^p), etc. 



4.1 The Lower Bound on 'di/ 2 {Gn,p) 

To bound i?i/ 2 (G„^p) from below, we make use of an estimate on the probable 
value of the SDP relaxation SMC of MAX CUT (cf. Sec. 2). Suppose that co/n < 
p < 1 — co/n for some large constant cq > 0. Combining Thms. 4 and 5 of [6] 
instantly yields that there is a constant A > 0 such that 

P ^SMC(G„,p) > i < exp(-2n). (4) 

Let G = (V) E) be a graph with adjacency matrix A = (oy Let 

vi,. . . ,Vn be a vector fc-coloring of G, where k = ’&i/ 2 {G) > 2. Then ||ni|| = 1 
for all z, and {vi,Vj) < —l/{k — 1) whenever {i,j} G E. Therefore, 

SMC(G) > ^ ^(1 - >#e(^ + . (5) 

i<j ^ ^ 



Let Co/n < p < 1 — co/n for some large constant cq > 0. By Chernoff bounds 
(cf. [18, p. 26]), 

P (#A(G„,p) < Qp - 8n3/2pi/2(i _ p)i/ 2 ^ < exp(-2n). (6) 



Combining (4), (5), and (6), we conclude that 



'^l/2(Gn,p) ^ ’^l/2{Gn,p) ~ 1 ^ 



( 2 )^ — 8n^/^p^/^(l — ^ 1 I np 

(A + 4)n3/V/2(l-p)^/2 “ 2(A + 4) V 



holds with probability at least 1 — exp(— n). As Gn,p = Gn,i-p, this proves (2) 
and the lower bounds in Thm. 3. 
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4.2 Spectral Considerations 

Let us briefly recall Juhasz’s proof that ’d{Gn,p) < (2 + o{l))yJn{l — p)!p for 
constant values of p, say. Given a graph G = (V,E), we consider the matrix 
M = M{G) = where 

f 1 if |L ^ if 

^[{p-l)/p otherwise, ^ (^) 

and rriii = 1 for all i. Then Ai(M) > i!){G). Moreover, as p is constant, the 
result of Fiiredi and Komlos [13] on the eigenvalues of random matrices applies 
and yields that d{Gn,p) < Ai(M) < (2 + o(l))y^n(l — p)/p whp. This argument 
carries over to the case ln(n)^/n < p < 1/2 (cf. [4]): 

Lemma 12. Let ln(n)^/n < p < 1/2. Then ||M(G„_p)|| < Z^Jnjp whp. 

However, it is easily seen that in the sparse case, e.g. if np = 0(1), we have 
Ai(M) ^ n whp. The reason is that in the case np > ln(n)^ the random graph 
Gn,p is “almost regular”, which is not true if np = 0(1). We will get around this 
problem by chopping off all vertices of degree considerably larger than np, as 
first proposed in [1]. Thus, let e > 0 be a small constant, and consider the graph 
G’ = (V', E') obtained from G = Gn,p by deleting all vertices of degree greater 
than (1 + e)np. 

Lemma 13. Suppose that co/n < p < ln(n)^/n for some large constant cq. Let 
G = Gn,p, and let M' = M{G'). Then P(||M'|| < c\^/rifp) > 9/10, where ci > 0 
denotes some constant. 

To prove L. 13, we make use of the following lemma, which is implicit in [10, 
Sections 2 and 3]; the proof is based on the method of Kahn and Szemeredi [11]. 

Lemma 14. Let G = Gn,p be a random graph, where co/n < p < ln(n)^/n 
for some large constant cq > 0. Let n' = ffV{G'), e = n! ' 1 € R” , and 
A! = A{G'). For each (5 > 0 there is a constant C{d) > 0 such that in the case 
np > C{S) with probability > 1 — <5 we have 

max{|(A'u,e)|, |(H'u,r(;)|} < ciytnp for all v,w LI, ||t!|| = \\w\\ = 1. (8) 

Here ci > 0 denotes a certain constant. 

In addition, the proof of Lemma 13 needs the following observation. 

Lemma 15. Let ci be a large constant. The probability that in G = Gn,p there 
exists a set Lf CV , ffU > n/2, such that \ffE{G[U])—ffU'^p/2\ > 
is less than exp(— n). 

Proof. There are at most 2" sets U. By Chernoff bounds (cf. [18, p. 26]), for a 
flxed U the probability that \ffE{G[Lf\) — #17^p/2| > is at most 

exp(— 2n), provided that cq, ci are large enough. □ 
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Proof of Lemma 13. Let G = Gn,p, let n' = ffV{G'), and let A' , e be as in L. 
14. Without loss of generality, we may assume that V = V{G') = {1, . . . ,n'}. 
Let Cl > 0 be a sufficiently large constant. Let J signify the n' x n' matrix 
with all entries equal to 1. Letting <5 > 0 be sufficiently small and cq > C'(^), 
we assume in the sequel that (8) holds, and that G has the property stated in 
L. 15. Let z € R" , ll^ll = 1. Then we have a decomposition z = ae + 

||u|| = 1, u _L 1, + /3^ = 1. Since ||M'z|| < ||M'e|| + ||M'i;||, if suffices bound 

max^_Le, 11 ^ 11=1 ||M'i;|| and \\M'e\\. 

Let p : R" ^ R" be the projection on the space !■*■. Then A'v = pA'v + 
{A'v^e)e, whence ||^^r’|| < + ciyTip, for all unit vectors i; T 1 . In order 

to bound ||pA'r;||, we estimate ||p^VII via (8): 

\\pA'p\\= sup \{pA'py,y)\= sup \{A'py,py)\= sup \{A'y,y)\ < ciyAIp. 
I|y||=i lly|l=i lly|l=i^ i-Ly 

Consequently, ||M'i;|| = ||(J- ^A')v\\ = ^\\A'v\\ < 2ci\/n7p {v T 1, ||r;|| = 1). 

To bound ||M'e||, note that —pM' = A' — pj. Let d = 2ffE{G')/n', and x = 
A'e—{d/n')Je. Then a: T 1, and by (8) we have ||cc|p = {A' e, x) — {{d/n') Je, x) = 
(A'e,x) < ciyTip||a:|| , whence ||a;|| < c\y/np. By L. 15, \d — n'p\ < ciy/np. As a 
consequence, || (J/n') Je— pJe|| < Ci^/np. Therefore, ||pM'e|| < ||a;|| + ||(J/n') Je— 
pJe\\ < 2ciy/np, i.e. ||M'e|| < 2ci \Jnjp. □ 

4.3 Bounding 'd 2 {Gn,p) from Above 

Let CQjn < p < \j2 for some large constant cq > 0. The following lemma is a 
consequence of the characterization of '&2 as an eigenvalue minimization problem 
given in [27]. 

Lemma 16. Let G be any graph. Let M = M{G). Then Ai(M) > ^ 2 {G). 

In the case ln(n)^/n < p < 1/2, combining L. 12 and L. 16 yields that 
’d 2 {Gn,p) < C 2 ^Jnjp whp. for some constant C 2 > 0, as desired. Thus, let us 
assume that co/n <p< ln(n)^/n in the sequel. Let e > 0 be a small constant. 

Lemma 17. With probability at least 9/10 the random Gn,p has at most 1/p 
vertices of degree greater than (1 + e)np. 

Proof. For each vertex v of Gn,p, the degree d(v) is binomially distributed with 
mean (n — l)p. By Chernoff bounds (cf. [18, p. 26]), the probability that d{v) > 
(1 + e)np is at most exp(— e^np/100). Hence, the expected number of vertices 
V such that d{v) > (1 + e)np is at most nexp(— e^np/lOO) < l/(10p), provided 
np > Co for some large constant cq > 0. Therefore, the assertion follows from 
Markov’s inequality. □ 

Let G = Gn,p, and let G" = {V/ E') be the graph obtained from G by deleting 
all vertices of degree greater than (1 + e)np. Let V” = V\ V , and G" = G\V']. 
Combining L. 17 and L. 13, we obtain that 

P (^? 2 (G') < C 2 and MG") < #V{G") < l/p < ^/Mv) > 1/2, 
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where C 2 denotes a suitable constant. Consequently, Prop. 8 yields that 
P(^?2(G„,p) < (C2 + 1)V^) > 1/2. 

Let /i = (c2 + l)\/njp, t = \Yi(n)y/n, and note that t = o{y^njp). Then, by 
Thin. 9, P(i? 2 (G„_p) > p + t) < 30 exp (— I7(ln(n)^)) = o(l). Since t < \Jnjp, 
we get that i?2(G„_p) < (c2 + 2)^/ripp with high probability. 



4.4 Bounding 'd 2 {Gn,p) from Above 

Let us first assume that ln(n)^/n < p < 1/2. Let G = {V,E) = Gn,p be a 
random graph, and consider the matrix M = j^En — -j^M{G), where En is 
the n X n-unit matrix, and M(G) is the matrix defined in (7). Combining L. 12 
and L. 16, we have i?2(G) < Ai(M) < \\j^E- < y^||M||+ 2 < C4^np 

whp., where C4 > 0 is a certain constant. 

Now let co/n < p < ln(n)^/n for some large constant cq > 0. In this case, the 
proof of our upper bound on i?2(G„^p) relies on the concentration result Thm. 2. 

Lemma 18. Whp. the random graph G = Gn,p admits no set U <Z V , < 

1/p, such that x(G[t/]) > ,Jnp. 

Proof. We shall prove that for all U CV, f/U = i/ < 1/p, we have 4/E{G[U]) < 
v^jvp/2. Then each subgraph G\U] has a vertex of degree < ,Jnp, a fact which 
immediately implies our assertion. Thus, let v < 1/p. The probability that there 
exists some U CV, f/U = v, f/E{G\U]) > v^/np/2, is at most 




(2) < 

Vy/Ep/2j 



en / ev^\ 

) 



Let b,j = {en/v){evy/p/ . Observe that the sequence {bu)v=i,...,n is 
monotone increasing, and that 61/p = enp{e/ y/np)'^/'^ < exp(— 2 ). Therefore, 
^ ^i/p"Vp < n~'^p~^ = 0(1). Moreover, if < ln(n), then 6j. < 

env~'^{ev ^/ < 1 /n, whence K = o(l)- Thus, K = o(l)^ 

thereby proving the lemma. □ 

Let G = {V,E) = Gn,p be a random graph, and let G' = (V',E') be the 
graph obtained from G by removing all vertices of degree greater than (l + £)np, 
where £ > 0 is small but constant. Let V" = P \ V , and let G" = G[P"]. By 
L. 17, with probability at least 9/10 we have f/V < 1/p. Therefore, by L. 18, 
P(i? 2(G") < ,Jnp) > P(x(G") < y/np) > 9/11. To bound -d2(G'), we consider the 
matrix M = j^En' — j^M{G'). By L. 16, ^ 2 {G') < \\{M). Moreover, by L. 13, 
with probability > 9/10 we have i?2(G') < \\{M) < i^||M'|j + 2 < C 4 y/np, for 
some constant C4 > 0. Prop. 8 implies that ^ 2 {G) < ^ 2 {G') + ■d2(G"), whence 
we conclude that P{d 2 {Gn,p) < (04 + l)y/np) > 1/2. Since Thm. 2 shows that 
’& 2 {Gn,p) is concentrated in width one, we have 

P ('dl/2(Gn,p) < diGn.p) < d2{Gn,p) < (C4 + l)y^np+ l) = 1 — o(l). 



thereby completing the proof of Thm. 3. 
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Remark 19. One could prove slightly weaker results on the probable value of 
'^{Gn,p) and ■d{Gn,p) than provided by Thru. 3 without applying any concen- 
tration results, or bounds on the SDP relaxation SMC of MAX CUT. Indeed, 
using only L. 17, 18, 13 (thus implicitly [10]) and the estimates proposed in [19], 
one could show that for each ^ > 0 there is C{S) >0 such that P(ci y/njp < 
’&{Gn,p) < C 2 \fnjp) > 1 — ^ and P(c 3 yTip < i9(G„^p) < c^y/Wp) >1 — 5, provided 
np > C{d). Such an approach is mentioned without proof independently in the 
latest version of [10]. 

5 Approximating the Independence Number and 
Deciding fc-Colorability 

Approximating the Independence Number. The algorithm ApproxMIS for 
approximating the independence number consists of two parts. First, we employ 
a certain greedy procedure that on input G = Gn,p finds a large independent set 
whp. Secondly, we compute 'd(G) to bound a{G) from above. Following [23], to 
find a large independent set of G = Gn,p, we run the greedy algorithm for graph 
coloring and pick the largest color class it produces. 

Lemma 20. The probability that the largest color class produced by the greedy 
coloring algorithm contains < ln{np)/(2p) vertices is at most exp(— n). 

Proof. The proof given in [23] for the case that p > carries over. □ 

The following algorithm is essentially identical with the one given in [4] . 

Algorithm 21. ApproxMIS(G) 

Input: A graph G = (V,E). Output: An independent set of G. 

1 . Run the greedy algorithm for graph coloring on input G. Let I be the largest 
resulting color class. If #/ < ln(np)/(2p), then go to 5. 

2. Compute 'd(G). If 'd(G) < Cy/nip, then output / and terminate. Here G 
denotes some sufficiently large constant (cf. the analysis below). 

3. Check whether there exists a subset S of V, ffS = 251n(np)/p, such that 
#U \ (>5' U N{S)) > 12(n/p)^/^. If no such set exists, then output / and 
terminate. 

4. Check whether in G there is an independent set of size 12 {n/p)^G ^ jf this is 
not the case, then output I and terminate. 

5. Enumerate all subsets of V and output a maximum independent set. 



Lemma 22. The expected running time of ApproxMIS{Gn,p) is polynomial. 

Proof. The first two steps can be implemented in polynomial time. By Thm. 3, 
the median p of d{Gn,p) is at most cy/njp, for some constant c. Therefore, 
Thm. 9 entails that the probability that ApproxMIS runs step 3 is less than 
exp(— (n/p)^/^), provided G is large enough. Furthermore, up to polynomial 
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factors, step 3 consumes time < exp(25 ln(np)^/p) < exp(Y^n/p). Hence, the 
expected time spent executing step 3 is polynomial. Taking into account L. 20, 
the expected running time of the remaining steps can be estimated as in the 
proof of Thm. 4 in [4] . □ 

Finally, it is not hard to show that ApproxMIS guarantees the desired ap- 
proximation ratio. 

Deciding fc-Colorability. Following [22], we decide fc-colorability by comput- 
ing the vector chromatic number of the input graph. Let k = fc(n) be a sequence 
of positive integers such that k(n) = o{^/n). Since the vector chromatic number 
is always a lower bound on the chromatic number, the answer of the following 
algorithm is correct for all input graphs G. 

Algorithm 23. Decidefc(G) 

Input: A graph G = {V,E). Output: Either “x(G) < fc” or “x(G) > fc”. 

1. If ■di/ 2 (G) > k then terminate with output “x(G) > fc”. 

2. Otherwise, compute x(G) in time o(exp(n)) using Lawler’s algorithm [24], 
and answer correctly. 



Lemma 24. Suppose that p > Ck^/n for some large constant G. Then the 
expected running time of DecidekiG'f p) is polynomial. 

Proof. In [20] it is shown that ■di /2 can be computed in polynomial time. Since 
the second step consumes time o(exp(n)), (2) shows that the expected running 
time of Decide^ on input G^^ is polynomial. □ 

The analysis of Decide^ on input Gn,r, r > Gfc^, is based on Thm. 4 and 
yields the proof of Thm. 7. 
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Abstract. We investigate randomized processes underlying load balancing based 
on the multiple-choice paradigm: m balls have to be placed in n bins, and each 
ball can be placed into one out of 2 randomly selected bins. The aim is to dis- 
tribute the balls as evenly as possible among the bins. Previously, it was known 
that a simple process that places the balls one by one in the least loaded bin 
can achieve a maximum load of m/n + ©(log log n) with high probability. Fur- 
thermore, it was known that it is possible to achieve (with high probability) a 
maximum load of at most [m/n] -1-1 using maximum flow computations. 

In this paper, we extend these results in several aspects. First of all, we show 
that if m > c n log n for some sufficiently large c, then a perfect distribution of 
balls among the bins can be achieved (i.e., the maximum load is [m/n]) with 
high probability. The bound for m is essentially optimal, because it is known 
that if m < c' n log n for some sufficiently small constant c! , the best possible 
maximum load that can be achieved is \m/n\ -\- 1 with high probability. Next, 
we analyze a simple, randomized load balancing process based on a local search 
paradigm. Our first result here is that this process always converges to a best 
possible load distribution. Then, we study the convergence speed of the process. 
We show that if m is sufficiently large compared to n, then no matter with which 
ball distribution the system starts, if the imbalance is A, then the process needs 
only A ■ steps to reach a perfect distribution, with high probability. We also 
prove a similar result for m « n, and show that if m = 0(n log n/ log log n), 
then an optimal load distribution (which has the maximum load of \m/ri \ -\- 1) 
is reached by the random process after a polynomial number of steps, with high 
probability. 



Keywords: load balancing, local search algorithms, stochastic processes. 

1 Introduction 

The study of balls-into-bins games or occupancy problems has a long history (see e.g. 
[1,2,3,4,5,8,10,11,12,18]). These problems have numerous applications, e.g., in graph 
theory, queueing theory, hashing, and randomized rounding. In general, the goal of a 
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balls-and-bins algorithm is to assign a set of independent objects (tasks, jobs, memory 
blocks) to a set of resources (servers, disks) so that the load is distributed among the 
bins as evenly as possible. 

In the classical single-choice game, each ball is placed into a bin chosen indepen- 
dently and uniformly at random (i.u.r.). For the case of n bins and m > nlogn balls 
it is well known that there exists a bin receiving m/n 0{\Jm logn/n) balls. This 
result holds not only in expectation but also with high probability. (We say that an event 
A occurs with high probability (w.h.p.) if Pr[A] > 1 — for an arbitrarily chosen 
constant a > 1.) On the other hand, it was shown by Azar et al. [1] and Berenbrink 
et al. [2] that if the balls are placed in a sequential (on-line) fashion and each ball is 
assigned to the currently least loaded of the two locations (ties broken arbitrarily), then 
the maximum load of any bin is m/n -I- 0(log log n) with high probability. It can also be 
proven [1,2] that any protocol that assigns the balls to the bins in an on-line fashion (that 
is, the decision where the ball is placed is performed only on the base of the placement 
of the previously placed balls) cannot be stochastically better than the scheme above. In 
particular, this implies that in any on-line scheme, with high probability, there is a bin 
with load m/n -|- 0(loglogn). 

On the other side, some authors have been studying off-line assignments. In off-line 
assignments, after first selecting the two locations for all the balls, one seeks an optimal 
placement of the balls assuming each ball can choose only among its two locations 
and the locations of all balls are known to the algorithm (off-line case). This problem 
arises naturally in numerous applications, for example, in hashing, scheduling, load 
balancing, and video on demand (see, e.g., [1,7,9,14,15,16]). (For example, Sanders 
et al. [16] discussed in depth applications to support fast parallel access to external 
memory systems with parallel disks and Karp [7] discussed applications in video on 
demand; Karp called our problem fc-orientability.) 

Let the minmax load be the minimum, over all possible placements of the balls into 
bins, of the maximum load in the system. Azar et al. [1] showed that for n = 0(m), the 
minmax load is 0(1), with high probability. Later, Frieze (personal communication in 
[1]) and, independently, Czumaj and Stemann [5], tightened this bound and, in particu- 
lar, showed that for n = m, the minmax load is exactly 2, with high probability. Sanders 
et al. [16] extended the result from [1,5] to arbitrary m and proved the following result. 

Theorem 1. [16] The minmax load is at most \m/ri \ 4- 1, with high probability. □ 

Notice that since the minmax load cannot be smaller than [m/n], this bound is 
optimal up to an additive constant 1. Furthermore, it is easy to see that there exists a 
positive constant A, such that if m < An Inn, then the bound in Theorem 1 is tight^. 
Our first contribution is that this bound for m is asymptotically tight in the following 
sense: there is a constant c such that if m > cn In n, then a perfect balance is possible: 

Theorem 2. There exists a positive constant c such that for every m > cn In n, the 
minmax load is exactly [m/n], with high probability. 

^ Indeed, if we choose at random two locations for each of the An In n balls, then there will be 
a bin that has not been chosen by any ball. Therefore, there is a bin whose load is 0 w.h.p. and 
hence it is impossible that all bins have identical load of m/n, w.h.p. 
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Stochastic load balancing. Next, we present a novel approach to off-line assignments 
and discuss a new stochastic process (algorithm) that achieves optimal maximum load. 
Sanders et al. [16] described a polynomial time algorithm that finds an optimal as- 
signment of the halls into bins minimizing the maximum load (which in this optimal 
allocation is equal to the minmax load). Their algorithm uses maximum flow computa- 
tions. 

A drawback of the approach by Sanders et al. is that it requires global (central- 
ized) knowledge about locations of all balls, which is far too space consuming if m 
is large. This makes also the algorithm difficult (if suitable at all) for implementations 
in distributed or decentralized systems (like, for example, systems of parallel disks as 
discussed in [9,16]). Therefore, as our second contribution, we present a simple, mem- 
oryless, local search algorithm that can balance the load of the bins in the system as 
much as this is possible. The idea behind our algorithm is to begin with an arbitrary 
assignment of the balls to the bins, and then to use a stochastic replacement process that 
gradually improves the balance of the bins’ load. 

Suppose that initially all the balls have chosen their locations in { 1 , . . . , n} and each 
ball is (arbitrarily) placed in one of its two locations. The Self-Balancing Algorithm 
repeats the following Self-Balancing Step: 



Self-Balancing Step: 

Pick independently and uniformly at random a pair of bins (bi, 62). 

If there is a ball placed in bi with alternative location in bin 62, then 

Pick any ball x that is placed in bi with alternative location in bin 62; 
Place X into the least loaded bin (among bi and 62); 

If tie, that is, bin 61 has (without x) the same load as bin 62, then 
place X into a randomly chosen of the two bins. 



We prove two theorems about the Self-Balancing Algorithm (throughout our anal- 
ysis, unless stated otherwise, terms “with high probability" are with respect to the ran- 
dom choices of the two locations of each ball, as well as the random choices of balls in 
the Self-Balancing Algorithm). 

The first theorem shows that the Self-Balancing Algorithm will gradually converge 
to states in which the maximum load is best possible. 

Theorem 3. If the Self-Balancing Algorithm is run sufficiently long ( i.e., the Self-Balan- 
cing Step is repeated sufficiently many times), then the maximum load of any bin in the 
system is equal to the minmax load with probability 1. (The probability 1 is with respect 
to the random choices of balls in the Self-Balancing Algorithm only.) 

In particular, if the Self-Balancing Algorithm is run sufficiently long then the max- 
imum load of any bin in the system is smaller than or equal to [m/n] -P 1 with high 
probability. If, additionally, m > cn Inn for a sufficiently large constant c, then the 
maximum load is exactly [m/n] with high probability. 

The Self-Balancing Algorithm is a simple example of a local search algorithm, sim- 
ilar to load balancing algorithms existing in the literature before, see, e.g., [6,13]. The- 
orem 3 shows the non-trivial property that no matter with which state (i.e., assignment 
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of balls to bins) the Self-Balancing Algorithm starts, it will always converge to a state 
in which the maximum load is optimally small. Notice that in many local search ap- 
proaches one frequently arrives at a “dead-lock” situation, in which the balancing may 
be far away from optimal and no re-balancing progress is possible (that is, a locally 
optimal solution is not in a global optimum). Theorem 3 shows that this is not the case 
for the Self-Balancing Algorithm. (Observe, however, that if we removed the random- 
ized rule for tie breaking, then — as one can easily show — the algorithm would not 
necessarily converge to an optimal state.) 

The next theorem considers the heavily loaded case and deals with the speed of the 
“convergence” of the Self-Balancing Algorithm to a state in which the maximum load 
is upper bounded by [m/n] . Let the imbalance of the system be its distance from a best 
possible distribution, or more precisely, max{0, load of bin i — \m/ri \ }. 

Theorem 4. If m ^ n, then after a polynomial number (with respect to n only) of 
Self-Balancing Steps the maximum load in the system is equal to \mjn\, with high 
probability. Furthermore, if the system imbalance is A, then the number of steps is 
A ■ with high probability. 

Notice that if the balls are allocated to the bins in the on-line fashion using the least 
loaded bin approach, as in [1,2], the system imbalance is Z\ = 0(n log log n), with 
high probability [ 2 ] . Therefore, Theorem 4 implies the following corollary. 

Corollary 1. Ifm ^ n, then in time 0{m) + one can find a perfect load distri- 
bution with the maximum load of the system equal to [m/n], with high probability. □ 

As we argued before, one cannot extend the result from Theorem 4 to the case 
771 « n, because then the minmax load is expected to be equal to [771/77] -f 1 (instead 
of [777/77]). Our next theorem shows however that if m is close to 77, then the Self- 
Balancing Algorithm still rapidly converges to the optimal distribution. 

Theorems. If m = 0 (t 7 log 77/ log log 77), then after a polynomial number (with re- 
spect to n) of Self-Balancing Steps the maximum load in the system is smaller than or 
equal to [777/77] -f 1 , with high probability. 

Notational conventions. To simplify the presentation of the paper, we will use a short- 
hand p, to denote m/n and p to denote [777/77] = [/r] . We shall identify the balls with 
the integers in 777} = [777] and the bins with the integers in {1, . . . , 77} = [77]. 

Let the load of a bin b e [77] be equal to the number of balls placed in b. Notice that the 
average load among all the bins is p. 



2 Perfect Balancing for 17 (n log n) Balls 

In this section we prove Theorem 2 , that is, we show that if 777 > c 77 log 77 for certain 
suitable constant c, then the minmax load is [777/77] = p, with high probability. It is 
easy to see that it is sufficient to prove this bound in the case p = p, and therefore from 
now on we assume that p is an integer. 
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Let IB denote the set of n bins in the system. Let us fix an allocation of m balls to 
n bins in 05 such that each ball has two locations in 05 (we allow a ball to have both 
locations in the same bin). For any (7 C 05, let ^\U] denote the number of balls having 
all locations in the bins in U. Then, one can show the following result (see [16,17]). 



Lemma 1. [17, Theorem 1 ] The minmax load is equal to max[/c® , 



nu] 

iw 



□ 



Consider the stochastic process of assigning two locations of the m balls to the n 
bins in 05 i.u.r. For any set (7 C 05, let Cjj be the random variable denoting the value of 
'T[U]. Furthermore, let £jj be the random indicator of the event that Cjj > /r • |(7| and 
let £ = \/ij( 2 ^ [ 7^0 £u- Our goal is to show that 



Pr[£] < n~'^ . 



( 1 ) 



for a constant 7 depending on c. 

Let05fc = {U C 05 : |C/| = k}. Then, by the union bound, to prove (1) it is enough 
to prove the following bound for every k, 1 < k < n — 1, and for every set (7 C 05^: 



Pr[5c/] < 



1 



( 2 ) 



From now on, we concentrate on proving inequality (2). Let us observe that for any 
set (7 S the value of Cjj is a binomial random variable with the parameters m and 
(fc/n)^, which we denote by B(m, (fc/n)^). Therefore, Pr[f[/] = Pr[B(m, (fc/n)^) > 
m • k/n] < Pr[B(m, (k/n)'^) > m ■ k/n] and our goal now is to investigate bounds for 
Pr[B(m, {k/nY) > m ■ k/n]. 

We begin with three simple results about concentration of binomial random variables. 



Lemma 2. 



7. For any t > 6mq^, Fr\E>{m,q^) > t] < 2~*. 

2. For any 0 < q < 1, Pr [B(m, q^) > q ■ m] < exp(— 2 q^ (1 — qY m). 

3. For any Q < q < 1, if < qfor certain u > \, then Pr [B(m, q^) > q ■ m] < 



Let m > cnlnn for a large constant c. Let U G and q = k/n. Let us first 
consider the case k/n = q < 0.1. Then, if we set f = mk/n, then we have t > 
6 • E[B(m, g^)], and hence by Lemma 2 (1) and by the inequality < n^, we get 
(provided c is a large enough constant): 



Pr[£u] < 2 -‘ = < 



(7+1) In n— A; Inn 



7j7+i . 



(h 



(3) 



Next, we consider O.ln < k < 2/3 n. Then, by Lemma 2 (2) and by observing that 
(^) < 2 ", we have (again, if we set m = c n In n for a large enough constant c) 



Pr[^^c/] < exp(-2(^i^r^)^m) < e 



— 0.001m ^ (7+1) In n— fc In n ^ 



, 7+1 . 



(;) ' 



(4) 



^ We do not have to consider the case U = 3Sn because in that case £u trivially never holds. 
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The remaining case is when k/n = q>2/?>. Then, we can apply Lemma 2 (3) with 
It = 2.5 to obtain 

Pr[5y] < Pr[B(m,g2) > qm] < (2.5/e)™/^ < g-(7+i)inn-fcinn < . 

(5) 

Therefore, from inequalities (3 - 5), we have that for every integer fe, l<fe<n— 1, 
and for every U G Q5fc, we have Pr[f[/] < . This implies that Pr[f] < n~^ , 

which in turn yields Theorem 2. □ 



3 Convergence to Optimal Assignment 

In this section we sketch the proof of Theorem 3. We begin with basic definitions and 
notation. A placement of the balls after performing t repetitions of the Self-Balancing 
Step, t > 0, is called the tth assignment, and is denoted by At- To each assign- 
ment At we assign a load vector, which is vector Lt = (Lt(l), . . . , Lt(n)) such that 
L((j) denotes the load of the yth fullest bin in At- For any two load vectors L = 
(L(l), . . . ,L(n)) and L* = (L*(l), . . . ,L*(n)), we say L majorizes L*, denoted by 
L ^ L*, if for every j, 1 < j < n, we have X)r=i ^(''’) — XJr=i Furthermore, 

we write L L* if L ^ L* and there is at least one j with L(r-) > 

Our first lemma describes the way the load vector can change in the course of the 
algorithm. Informally, it says that after any repetition of Self-Balancing Step the load 
vector will never worsen. 

Lemma 3. For any f > 0, independen tly of the random choices performed by the Self- 
Balancing Algorithm, we always have Lj ^ Lj+i. □ 

Let us observe two important consequences of Lemma 3. Firstly, this lemma implies 
that the maximum load never increases. Secondly, Lemma 3 yields the following claim: 



Lemma 4. The number of changes in the load vector is upper bounded by m ■ n. □ 

Now, since we know the algorithm gradually converges to a more balanced distri- 
bution of the bins’ loads, we formally describe the states to which it converges. We say, 
a system is stable in step r, if independently of the random choices performed in the 
iterations T > r of the Self-Balancing Algorithm we will have L^- = for every 
T > T. In order to characterize stable states formally, we define a directed multigraph 
representing the state of the system (see also, e.g., [5,16], for similar representations). 

Definition 1. A directed multigraph G = (V,E) representing the system is a directed 
multigraph with the vertex set V = {1, . . . , n} corresponding to the bins in the system 
and the edge multiset E (loops are allowed) corresponding to the assignment of the 
balls in the system- Each edge is associated with a ball, has as the endpoints the two 
locations of the associated ball, and it is directed from (outwards) the bin containing 
the associated ball- 
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We denote by Gt = (V,Et) the directed multigraph representing At - For any vertex 
t; of G we denote by out-deg(z;) the out-degree of v in G; if G is not clear from the con- 
text, then we also use the notation out-degg,(u). The in-degree is defined analogously. 
Notice that since the choices of the locations of each bin are performed at random, 
the undirected version of any Gt is a random multigraph with n vertices and m edges 
(where each endpoint of each edge is selected independently and uniformly at random). 

The following lemma follows directly from Dehnition 1 . 

Lemma 5. If Gt = (V, Et) is a directed multigraph representing At, then for any j, 
I ^ j n, the out-degree of vertex j is equal to the load of bin j in At- □ 

Let Gt = (l/jiSr) be the directed multigraph representing At- A directed path 
{vi,V 2 , - - - , vf) in Gt is called a i/ope if out-deg(ui) > out-deg(uf)-|-2 and out-deg(ui) 
> out-deg(ui+i) for every i, 1 < i < .(. If (ui, t> 2 , . . . , vf) is a slope in Gt, then we can 
straighten {y\,V 2 , - - - , vf) by modifying the directions of the edges in Gt (following 
the rules in the Self-Balancing Algorithm) so that the load vector will change (see also 
a scheme presented in Figure 1). Indeed, let us consider the case that f > 3 (the case 
f = 2 can be handled similarly), and assume (actually, without loss of generality) that 
out-deg(ui) = out-deg(v 2 ) -f 1, out-deg(uj) = out-deg(uy+i) for 2 < J < I - I, 
and that out-deg(uf_i) = out-deg(ui) -f 1. Then, we reverse directions of the edges 
{vj,Vj+i) for all 1 < j <1—1 (this can be easily done according to the rules in the 
Self-Balancing Algorithm). After applying these changes, the bin corresponding to the 
vertex v\ decreased its load by 1, the bin corresponding to the vertex Vi increased its 
load by 1, and the load of all other bins remains the same. This implies that the load 
vectors L of At and L' of the new system state fulfill L L'. 

The following key lemma provides a necessary and sufficient condition for a system 
to be stable at step t- (Notice that the only if part follows from our arguments above.) 

Lemma 6. A system is stable at step r if and only if the directed multigraph Gt = 
(V, Et) representing At has no slope- □ 

The next lemma describes a relationship between stable states and the maximum 
load in the system. 

Lemma 7. Consider a system of m balls and n bins with the minmax load k- Then, if 
the system is stable in step r then the maximum load of At is K- 

Proof The proof is by contradiction. Let us consider a system of m balls and n bins 
with the minmax load k. Let us suppose the system is in a stable state At represented 
by the directed multigraph Gt = (V,Et), and, for the purposes of contradiction, let us 
assume that the maximum out-degree in Gt is greater than k. 

Since At is a stable state, we know by Lemma 6 that Gt has no slope. Let us pick 
any vertex v G V with out-deg^. (v) > k- Let U be the set of all vertices in Gt (not 
including v) that are reachable from u by a directed path in Gt- Since Gt has no slope, 
all vertices in U must have the out-degree at least out-degg.^ (f ) — 1 > k- Therefore, 
if we define U* = U U {u}, fhen there are at least |C/| • k -f (k -f 1) balls having both 
locations in the bins corresponding to the vertices in U* - This, however, by Lemma 1, 
means that minmax load is at least -K-|-(/t-|-l)) > n, which is a contradiction 

to our initial assumption that the minmax load of the system is k- □ 
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r r-1 r-2 ^ r-1 r-1 







r-1 i-l r-1 r-1 r-1 




Fig. 1. Illustration describing the straightening procedure that changes the out-degrees 
of the vertices vi,V 2 , ■ ■ ■ ,Vg (with s = 5) on the slope performed in the proof of 
Lemma 6. In this case, initially we have out-deg(rii) = r, out-deg(ri 2 ) = out-deg(u 3 ) = 
out-deg(?; 4 ) = r — 1, and out-deg(t; 5 ) = r — 2. 



Now we are ready to complete the proof of Theorem 3. By Lemma 7, the system is 
not stable if and only if the directed multigraph Gr = (V, Er) representing At has a 
slope (wi , . . . , z;^) for certain positive £. Thus, if the system is not stable, then let us con- 
sider any shortest slope. Then, with a positive probability, in the next £ — 1 iterations 
in the Self-Balancing Algorithm we will perform slope straightening of (ni, . . . , ve), 
which will decrease the load of vi by 1, increase the load of by 1, and leave the 
remaining loads the same. Hence, if At is not stable, then after sufficiently many itera- 
tions of Self-Balancing Step, with probability 1 the load vector will be modified. Since 
fhe load vecfor may change af mosf n m limes, if we combine the arguments above with 
Lemma 7, after sufficiently many iterations in Self-Balancing Step, with probability 1 
the system will be in a stable state in which the maximum load equals the minmax load. 

□ 



4 Convergence to Optimal Assignment for m ^ n 

In this section we briefly sketch the proof of Theorem 4, which estimates the conver- 
gence speed of the Self-Balancing Algorithm for m ^ n. First of all, let us recall that 
by Lemma 4, the load vector may change at most n m times. Therefore, we only have 
to show that if the system is not stable, then after a polynomial number of steps of the 
Self-Balancing Algorithm the system will change its load vector with high probability. 
The following is the key theorem of our analysis (the proof is deferred to the full version 
of the paper). 

Theorem 6. Let n® logzrz = o(/z). Let ^ be an arbitrary constant. Let b be a bin with 
any load greater than or equal fo /i 4- Then, with probability at least 1 — 

- either every bin has load greater than or equal to fi + ^, 

— or the directed multigraph representing the current state of the system has a di- 
rected path of length at most 2 from the vertex corresponding to b to some other 
vertex u whose out-degree is strictly smaller than p- 
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In view of this theorem, with high probability, as long as the maximum load in the 
system is strictly larger than p, the directed multigraph representing the state of the 
system has always a slope (uq, • ■ • , with t < 2,no matter how the directions of the 
edges are set. (Indeed, in that case there is a bin b with the load larger than and if 
we set ^ = 0, then it is impossible that every bin in the system has load greater than or 
equal to Therefore, by Theorem 6, there must exist a directed path of length at most 
2 from the vertex corresponding to b to some other vertex u, such that the out-degree of 
u is strictly smaller than Therefore, either this path or its sub-path must be a slope.) 
Therefore, with probability at least 0{l/n'^), the Self-Balancing Algorithm will, in at 
most two steps, perform slope straightening of (uq, . . . , Ut) such that the out-degree of 
Vo decreases from some £to£—l and no other vertex on the path increases its out-degree 
to more than £ — 1. Therefore, the system will change its load vector with probability at 
least 0(l/n^). Hence, with high probability the system will change its load after O(n^) 
Self-Balancing Steps, and thus, after 0{mn^) steps the Self-Balancing Algorithm will 
reach a state in which the maximum load equals to the minmax load. 

Actually, it is easy to see that our arguments above can be used to show that if the 
imbalance of the system is A (where A = X^r=i max{L(i) — Jl, 0}), then the process 
needs only A ■ steps to reach a perfect distribution, with high probability. This 

yields the proof in the heavily loaded case. □ 

5 Convergence to Optimal Assignment for m = 0{n) 

In this section we deal with the proof of Theorem 5 and consider the convergence speed 
of the Self-Balancing Algorithm in the lightly loaded case. We focus only on the case 
771 = 0{n); we believe that this is the most challenging case and therefore we will 
elaborate on its proof. The analysis of the case m = 0{n log 77 / log log n),m = uj(n), 
is deferred to the full version of the paper. 

The main idea behind the proof is to use similar arguments as in the previous sec- 
tion, but this time we cannot assume that we have a slope of a constant length. The 
analysis requires the following three key properties. The hrst property, proven in [16], 
is that if the pairs of locations for all the balls are chosen i.u.r., then (with high probabil- 
ity, depending only on the random choices of the locations) in any state of the system, 
if there is a bin with load greater than Jl + 1 then there is a slope of length 0(log n). 
The second property is that the sum of the degrees (in- and out-degrees) of all vertices 
on this slope path is at most 0(log n). The third property is that the probability that a 
given slope path will be straightened is inversely proportional to the sum of the degrees 
of the vertices on this path. With these properties, we can show that the probability that 
in the next 0{mn log n) Self-Balancing Steps a slope of length 0(log n) is chosen and 
then straightened by the algorithm (without interfering with the other bins (vertices)) 
is at least 0(1/77*^^^^). This implies that (with high probability) in the next 
steps the Self-Balancing Algorithm will change the load vector. Therefore, (with high 
probability) after steps the Self-Balancing Algorithm will reach a state, in which, 
by Theorem 1, the maximum load is at most 'p. + l, with high probability. 

We describe now our analysis in more detail. We hrst develop some properties of the 
directed multigraphs discussed in Section 3. We begin with a lemma proven implicitly 
in [16, Lemma 14]. 
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Lemma 8. [16] Let Gt = (V,Et) be a directed multigraph representing certain At- 
Let m = 0{n). Then, with high probability (depending only on the random locations of 
the balls), either At has the maximum load of at most fi+l or Gt has a slope of length 



Our approach is to explore Lemma 8. First of all, from now on, we shall condition 
on the fact that there is an assignment of the balls among the bins with maximum load 
/X + 1. (By Theorem 1, this fact holds with high probability.) Then, by Lemma 8, we 
know that the system is either in the state when the maximum load is /i + 1, in which 
case we do not have to prove anything, or there is slope in Gt of length 0(log n). We 
consider only the latter case. 

We work in rounds, each round corresponding to 0(nf log^ n) repetitions of Self- 
Balancing Step. All rounds are independent. At the beginning of each round we take any 
slope 7T in G of length 0(log n) that is promised by Lemma 8 (if no such a path exists, 
then we know that we are already in a state with maximum load smaller than or equal to 
p,+ l). We prove in Lemma 10 that with probability greater than or equal to we 

will successfully straighten the slope in this round. From this and Theorem 3 it follows 
easily that after a polynomial number of rounds of the Self-Balancing Algorithm we 
reach a stable state having the maximum load at most p+ 1, with high probability. 

Now, our ultimate goal is to analyze the probability that a slope of length 0(log n) 
will be straightened in 0(n^ log^ n) iterations of the Self-Balancing Algorithm. We 
begin with an auxiliary lemma about random (undirected) multigraphs (the proof is 
deferred to the full version of the paper). 

Lemma 9. Let b and c be arbitrary positive constants. If G is a random undirected 
multigraph with n vertices and m < bn edges, then, with high probability G does not 
have any simple path of length less than or equal to c log n for which the sum of the 
degrees of the vertices on the path is greater than d ■ log n, where d is a constant. □ 

Our next and key result shows that the probability that the Self-Balancing Algorithm 
will straighten a given slope path is inversely proportional to the sum of the degrees of 
the vertices on this path. 

Lemma 10. Let b and c be arbitrary positive constants. Let G be an arbitrary directed 
multigraph with n vertices and m < bn edges. Suppose there is a slope path tt = 
(v\ , . . . ,ve) in G. Then, with probability greater than 



the load vector will change after less than or equal to2im log n iterations. 

Proof. We only sketch the proof and defer more details to the full version of the paper. 

Consider any slope tt = (v\,V2, - ■ - ,vf) of shortest length in the system. Recall that 
out-deg(ui ) — 1 = out-deg(u 2 ) = out-deg(u 3 ) = • • • = out-deg(u£_i) = out-deg(u^)-|- 
1. If i = 2, then the probability that the load vector will change in the next step is at 
least as large as the probability that we will choose the edge (ui, U 2 ), which is equal to 



O(logn). 



□ 



(n 



1 + out-deg{v\) + in-deg(vi) 



1 
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Fig. 2. A slope tt = (vi,V 2 , . . . ,Vi) with incident edges. We only include those edges 
(y, z) with out-deg(y) = out-deg(z) + 1 



1/n^. Hence, in this case the lemma easily follows. Therefore, from now on we shall 
assume that £ >3, i.e. there are no edges (y, z) in G with out-deg(y) > out-deg(z) + 1. 

We use the terminology from Figure 2. Initially, we have a slope tt = {vi, . . . ,Vi) 
of length f — 1. In each iteration of the Self-Balancing Algorithm we will hit a certain 
edge chosen at random and in this way we may modify the graph and the load vector. 
We observe that if we hit an edge that does not belong to tt nor is incident to tt , then any 
eventual modification of that edge will not influence path tt . Therefore, we only have 
to consider the following eight cases, when an edge of the following form is chosen: 
(i) (vi,V2), (ii) {vi-i,ve), (hi) (iv) (vi,X2), (v) (x 3 ,U 3 ), (vi) (u3,X4), (vii) 

(x 5 , ve), and (viii) {ve, xg). We say a very good edge is hit if we hit an edge from cases 
(iv) or (ix); a good edge is hit if we hit an edge from cases (i), (hi), (vi), or (vii); a bad 
edge is hit if we hit an edge from cases (v), or (viii). Very good edges create an edge 
(y, z) with out-deg(y) > out-deg(z) + 1, good edges make the slope shorter, and bad 
edges make it longer. 

Now, we consider a round lasting 2£n^ log n iterations and observe only very good 
edge hits, good edge hits, and bad edge hits. A round is called successful if no bad edge 
is hit until we either have a very good edge hit and then straighten the obtained path or 
we modify the slope path (we straighten it) by only good edges. One can show that with 
probability greater than or equal to 1 — 1 a round is either successful or we made a 
bad edge hit. Notice that there are at most out-deg(ui) + in-deg(u^) bad edges at the be- 
ginning, and there is at least one good edge at any time. Certainly, under the assumption 
that either a bad edge or {v£-i,vg) is picked, the probability that {vi-i,vi) is picked 
is at least 1/(1 -f out-deg(ui) -f in-deg(u^)). Once (v£-i,Vi) is picked, we concentrate 
on the edge (vi-2, and so on. Using this approach, we get that the probability 

that a round is successful is lower bounded by 

-2 H-out-deg(«i)+in-deg(t)i) ) ' 

completes the proof. □ 

We can reduce our analysis to the case when for the slope tt = {vi, . . . ,ve) we 
have out-deg(ui) = Jl + 2 and £ = O(logn). Therefore, by Lemma 9 we know that 
in-deg(ui) = O(logn), with high probability. Hence, by Lemma 10, the proba- 
bility that in a round lasting 2nm log n iterations we change the load vector is greater 
than or equal to ■ Hence, after yofy(n) rounds (iterations) of the Self-Balancing 

Algorithm we shall modify the load vector with high probability. Now, since the load 
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vector can be modified at most m • n times before we reach the stable state, the theorem 
follows. □ 
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Abstract. We introduce strong blenders. A strong blender Ble(-, •) uses 
weak sources X, Y to produce Ble(A, Y) that is statistically random 
even if one is given Y. Strong blenders generalize strong extractors [15] 
and extractors from two weak random sources [25,6]. We show that non- 
constructive strong blenders can extract all the randomness from X, 
as long as Y has logarithmic min-entropy. We also give explicit strong 
blenders which work provided the sum of the min-entropies of X and Y 
is at least their block length. Finally, we show that strong blenders have 
applications to cryptographic systems for parties that have independent 
weak sources of randomness. In particular, we extend the results of Mau- 
rer and Wolf [12] and show that parties that are not able to sample even 
a single truly random bit can still perform privacy amplihcation over an 
adversarially controlled channel. 



1 Introduction 

Imperfect Randomness. Randomization has proved to be extremely useful 
and fundamental in many areas of computer science. Unfortunately, in many 
situations one does not have ideal sources of randomness, and has to base a 
given application on imperfect sources of randomness. Among many imperfect 
sources considered so far, perhaps the most general and realistic source is the 
weak source [28,6]. The only thing guaranteed about a weak source is that no 
string (of some given length i) occurs with probability more than 2“^, where b 
is the so-called min-entropy of the source. We will call this source {i, b)-weak. 
Handling such weak sources is often necessary in many applications, as it is 
typically hard to assume much structure on the source beside the fact that 

* Partially supported by the NSF CAREER Award. 

** Supported by a doctoral fellowship from CNPq, Brazil. 

S. Arora et al. (Eds.): APPROX 2003-1- RANDOM 2003, LNCS 2764, pp. 252-263, 2003. 

© Springer- Verlag Berlin Heidelberg 2003 



On Extracting Private Randomness over a Public Channel 



253 



it contains some randomness. Thus, by now a universal goal in basing some 
application on imperfect sources is to make it work with the weak source. 

The most direct way of utilizing weak sources would be to extract nearly 
perfect randomness from such a source. Unfortunately, it is trivial to see [6] 
that no deterministic function can extract even one random bit from a weak 
source, as long as b < £ (i.e., the source is not random to begin with). This 
observation leaves two possible options. First, one can try to use weak sources 
for a given application without an intermediate step of extracting randomness 
from it. Second, one can try designing probabilistic extractors, and later justify 
where and how one can obtain the additional randomness needed for extraction. 

Using a Single Weak Source. A big successful line of research [26,24,6,7,28,3] 
following the first approach showed that a single weak source is sufficient to sim- 
ulate any probabilistic computation of decision or optimization problems (i.e., 
problems with a unique “correct” output which are potentially solved more ef- 
ficiently using randomization; this class is called BPP). Unfortunately, most of 
the methods in this area are not applicable in situations where randomness is 
needed by the application itself, and not mainly for the purposes of efficiency. 
One prime example of this is cryptography. For example, secret keys have to 
be random, and many cryptographic primitives (such as public-key encryption) 
must be probabilistic. Thus, new methods are needed to base cryptographic 
protocols on weak sources. So far, this question has only been studied in the set- 
ting of information-theoretic symmetric-key cryptography. In this scenario, the 
shared secret key between the sender and the recipient is no longer random, but 
comes from a weak source. As a very negative result, Mclnnes and Pinkas [13] 
proved that one cannot securely encrypt even a single bit, even when using an 
“almost random” {£,£— l)-weak source. Thus, one cannot base symmetric-key 
encryption on weak sources. Dodis and Spencer [9] also consider the question of 
message authentication and show that one cannot (non-interactively) authenti- 
cate even one bit using (£, £/2)-weak source (this bound is tight as Maurer and 
Wolf [12] showed how to authenticate up to £/2 bits when b > £!2). 

Basing more advanced cryptographic primitives on a single weak random 
sources also promises to be challenging. For example, it is not clear how to 
meaningfully model access to a single weak source by many users participating 
in a given cryptographic protocol. Additionally, moving to the computational 
setting will likely require making very non-standard cryptographic assumptions. 

Using Several Weak Sources. Instead, we will assume that each party 
will have its own weak source, which is independent from all the other weak 
random sources. In other words, while each individual party cannot assume 
that his source is truly random, the parties are located “far apart” so that 
their imperfect sources are independent from each other. For simplicity, we will 
restrict the number of independent sources to two for the remainder of this paper. 
One of the questions we will consider if it is possible to construct cryptographic 
protocols, like secret-key encryption or key exchange, in this new setting. In 
fact, rather than construct these primitives from scratch, we will try to extract 
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nearly ideal randomness from two weak sources, and then simply use whatever 
standard methods exist for the cryptographic task at hand! 

This brings us to the question of randomness extraction from two or more 
independent random sources, a question originated in the works of Santha and 
Vazirani [17,25]. Chor and Goldreich [6] were the first to consider general weak 
sources of equal block length; let us say that the sources X and Y are 
weak and {£ 2 , &2)-weak, for concreteness, while here we also assume .^1 = ^2 = £■ 
They showed that a random function can extract almost (61 + b 2 — £) nearly 
random bits in this setting.^ They also gave an explicit number-theoretic con- 
struction that can essentially match this (non-optimal) bound. Moreover, they 
showed that the simple inner product function is also a good bit-extractor under 
the same condition that 61 -|- 62 > £■ Recently, Trevisan and Vadhan [23] broke 
this the “barrier” 62 + ^2 > £■, but only for the very “imbalanced” case when 
bi = e'^£, 62 = (1 — 0{e))£ (for any e > 0). To summarize, while non-trivial ran- 
domness extraction is possible, the known constructions and parameters seem 
far from optimal. Unfortunately, improving this situation seems to be extremely 
challenging. Indeed, it is easy to see that the question of extracting randomness 
from two independent sources beyond what is currently known is even harder 
than a notoriously hard problem of explicitly constructing certain bipartite Ram- 
sey graphs (see [27,16]). 

Strong Extractors. A special case of the above question has received a huge 
amount of attention recently. It involved the case when one of the two sources, 
say Y, is perfect: 62 = £ 2 - In this case, one invests 62 bits of true randomness Y 
(called the seed) and hopes to extract nearly b\ + 62 random bits from Y and a 
given (6i,^i)-weak source X. A deterministic function Ext achieving this task 
has simply been called an extractor [15]. A strong extractor additionally requires 
Y itself to be part of the extracted randomness. In this case, Y is usually excluded 
from the output of Ext, so that the goal becomes to extract up to b\ random 
bits from X. By now, it is well known that one can indeed achieve this goal 
provided b± ^ log £2- Moreover, many explicit constructions of strong extractors 
which come very close to this bound are known by now (see [14,22,11,19,18] 
and the references therein) . Not surprisingly, strong extractors have found many 
applications (e.g., see [18]). 

Our Question. The general question of extracting randomness from two weak 
sources [17,25,6] concentrated on regular, non-strong extractors. If the extracted 
randomness is to be used as the secret key of the conventional cryptographic 
systems, this means that one should sample X and Y from two independent weak 
sources, and securely “transport” X to Y. Consider, for example, the following 
application. Alice and Bob stay together and wish to securely communicate 
when Alice goes away. They can agree on an auxiliary secret key X sampled 
from their common weak source. When Bob leaves far away, he gets access to an 
independent source Y . Assuming the parameters are right. Bob can now extract 
a nearly random secret key S = Ext(A, F). However, Alice only knows A, so 

^ A trivial strengthening of their technique can push this number to min(?)i,fe 2 ); we 
will later non-trivially push this to hi -\- 62 - 
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Bob has to send Y to Alice. In cryptography, it is conventional to assume that the 
communication channel between Alice and Bob is public. Thus, Bob has to send 
V “in the clear”. With regular extractors, even from two weak sources, there is 
no guarantee that Ext(A, Y) will look random to the eavesdropper who learns 
Y. On the other hand, conventional strong extractors resolve this problem, but 
rely on a strong assumption that Alice can sample a truly random Y and send it 
over the channel. In the world with no “true randomness” and only weak sources, 
this assumption is not realizable, unless eventually two independent sources are 
secretly brought together. 

The above example motivates our common generalization of previous work. 
We wish to consider strong extractors with weak seeds. We will call such functions 
strong blenders.'^ Namely, we want to design a function Ble such that Ble(A, Y) 
looks random even for an observer who knows Y, for any X and Y sampled from 
their corresponding (£i,bi) and (t'2,^2) weak sources. 

Our Results. As we demonstrate, such remarkable strong blenders exist. In 
particular, we show that a random function can be used to extract essentially all 
the randomness from X (i.e., nearly b\ bits), provided only that 62 > logt'i (and 
also bi > log £2)- The latter condition says that as long as the public seed Y has 
barely enough randomness, we can extract almost all the randomness from our 
target source X. Clearly, this bound generalizes the standard setting of (strong) 
extractors, where one needs £2 = b2 > log.^i to extract all the randomness from 
X. We also remark that our analysis non-trivially extends the previous work.® 
It involves a martingale construction, and then applying Azuma’s inequality 
to bound its deviation from the mean, thus strengthening what was known for 
regular (non-strong) extraction from two weak sources [6]. As mentioned, their 
result gave only bi + b2 — £ bits. It is easy to improve it to min(&i, 62) bits, but 
getting 61 -|- &2 bits — which follows from our more general bound — does not 
seem possible when using standard Chernoff type bounds used by [6]. 

Next, we address explicit constructions of strong blenders. Unfortunately, 
the large body of work on strong extractors does not seem to be applicable to 
strong blenders. Intuitively, standard extractors use the seed to pick a bit from 
a codeword of a list-decodable code [22], or to select a hash function from a 
small family of functions. These arguments seem to fall apart completely once 
the seed comes from a weak random source. On the other hand, any explicit 
constructions of strong blenders will in particular imply extraction from two 
independent weak sources, for which any improvement seems very hard, as we 
mentioned earlier. Thus, the best we can hope for is to extend the best known 
constructions in this latter setting to yield strong blenders. And, indeed, this is 
exactly what we achieve. First, we show that the inner product function is a one- 
bit strong blender for the case £\ = £2 = £, provided b\ + b2> £■ This argument 

^ We propose that functions that extract randomness from two weak sources be called 
blenders. With this choice of terminology, strong blenders are related to (regular) 
blenders in the same way strong extractors are related to their regular counterparts. 
® We are not aware of any written proof for the existence of strong extractors, as all 
the references we found point to [20,21]. 
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involves extending the combinatorial lemma of Lindsey (see Section 4). Second, 
we show that Vazirani’s multi-bit extraction for SV-sources can be applied to 
weak sources as well. This allows to extract Q{i) bits provided &i -I- &2 ^ 3£/2. 
Finally, we show that the explicit extractor of [6] based on discrete logarithms 
can also be also be extended to our setting, which gives a way to extract nearly 
(6i + b 2 ~ t ) /2 random bits. Again, we remark than all these extensions actually 
involve non-trivial modifications to the existing arguments. 

Privacy Amplification. Finally, we return to applications of strong blenders 
to the setting where different parties have independent weak sources, but all 
the communication between them is public. The most natural such application 
is that of key agreement (aka privacy amplification [5,4]) by public discussion: 
sending Y over the channel allows Alice and Bob to agree on a (nearly) random 
key S = Ble(A, y), provided the communication channel is authentic. There- 
fore, the remaining interesting case to consider is what happens when the channel 
is not only public, but adversarially controlled [12]. In particular, the question 
is whether we can build any kind of message authentication with a shared key 
coming from a (£i,6i)-weak block source, and without any local randomness. 
Specifically, assume Alice and Bob share a key A, B, X coming from 3 (possi- 
bly correlated) samples from the (£i, 6i)-weak block source. When Bob gets his 
hands on an independent source Y, he would like to authenticate Y using A, B. 
Then, they both can agree on the key S = Ble(X, Y), where Ble is our strong 
blender. As we mentioned, [9] showed that non-interactive one-time authentica- 
tion from Alice to Bob is impossible when bi < l\j2. On the other hand, Maurer 
and Wolf [12] gave a way to non-interactively authenticate up to I bits using two 
blocks of min-entropy £/2.® Assuming (as we shall) that l\ = £2, Bob can in- 
deed authentically transmit Y over the channel, so that both can apply a strong 
blender to agree on a random S = Ble(W, Y). Combining this observation with 
our explicit constructions of strong blender for £1 = £2 = we get that the first 
efficient privacy amplification without ideal local randomness, provided 61 £/2 

and b 2 ^ i — bi. 

Online version. We refer to the online manuscript [8] for the proofs of results 
that are only stated in this work. 

2 Preliminaries 

2.1 Basic Notation 

We mostly employ standard notation. The symbol log is reserved for the base 2 
logarithm. For a positive integer t, Ut denotes a random variable that is uniform 
over {0, 1}* and independent of all other random variables under consideration. 
We also write [t] = {1, 2, ... f}. For two random variables A, B taking values in 

® We remark that unlike our setting, Alice and Bob had ideal local randomness in 
the setting of [12], and used it at later stages of their application. Luckily, the 
authentication step was deterministic, which makes it “coincidentally applicable” to 
our situation. 
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the finite set A, their statistical distance is ||A — B\\ = 5 | Pr(A = a) — 

Pr(B = a)|, and the min-entropy of A is Hao (^) = miiiag^ — log(Pr(A = a)). 
Finally, if C is another random variable, C\A=a represents the distribution of C 
conditioned on A = a G A. 

2.2 Strong Extractors Vs. Strong Blenders 

Min-entropy quantifies the amount of hidden randomness in a source X. The 
objective of extractors is to purify this randomness with the aid of a (small 
amount of) truly random bits. 

Definition 1. Let k > 0, e > 0. A {k,e)~ extractor Ext : {0, 1}" x {0, 1}'^ ^ 
{0, 1}*” is a function such that for all n-hit random variables X with min-entropy 
Hao (V) > k ||Ext(V, Ud) ~ Um\\ < £• Ext is a {k,e)-strong extractor if the 
function Ext^ \ {x,y) ^ y o ExT(a:, y) is an extractor. 

In this work, however, we are interested in strong randomness extraction from 2 
weak sources, as defined below. 

Definition 2. [6] The set CG(^i, £ 2 , ^ 1 , ^ 2 ) of pairs of independent (Chor-Gold- 
reich) weak sources is the set of all pairs of independent random variables (X,Y) 
where X (respectively Y) is (resp. £ 2 ) bits long and Hao{X) > b\ (resp. 

(F) > b2). 



Definition 3. A {bi,b 2 ,s)~strong blender (^SB^ is a function 
Ble : {0, X {0, ^ {0, 1}™ 
such that for all pairs {X,Y) in CG(.^i,£ 2 , &i, ^ 2 )) we have 
||(F,Ble(V,F)) - {Y,Um)\\<e 

We state for later convenience the following proposition, which can be deduced 
from the linear programming argument in [6] (i.e. the fact that general sources 
of a given min-entropy are convex combinations of fiat distributions with the 
same min-entropy). 

Proposition 1. If b\ and &2 «Te integers, then for any function f : {0, 1}^^ x 
{0,1}^^ — *■ {0,1}™ the maximum of \\{Y, f{X,Y)) — {Y,Um)\\ over all {X,Y) 
contained in the set CG{£i,£ 2 ,bi,b 2 ) is achieved by fiat random variables, that 
is, by a pair {X, Y) for which X is uniform over a subset Sx C {0, 1}^^ with 
|5'x| = 2^1, and Y is uniform over Sy C {0,1}^^, |5v| = 

3 Existence of Strong Blenders 

From now on m, ii > bi > 2 and £2 > b 2 > 2 are positive integers and £ > 0 is a 
real number. The aim of this section is to prove non-constructively that strong 
blenders exist for certain choices of parameters, and to provide lower bounds 
that almost match the existence result. The theorems below are proven in [8]. 
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Theorem 1. There exists a {bi,b 2 ,e)-SB Ble : {0,1}^^ x {0,1}^^ — > {0,1}™ 
for any choice of parameters satisfying m < b\ — 21ogi, h\ > log2(^2 — ^2) + 
21og i + 0(1) and 62 > log2(^i - 61) + 21og i + 0(1). 

By noticing that strong extractors are special cases of strong blenders (e.g., 
using Y = and applying the lower bounds of Ta-Shma and Radhakr- 

ishnan [21], we obtain (see [8] for details) 

Theorem 2. For some constant c, ifb\ < i\ — c and 62 < l^ — c, then conditions 
m < bi — 2 log - and &2 > log2(-^i — bi) + 2 log - + 0(1) of Theorem 1 are in fact 
necessary for the existence of a (61, &2, e)-SB Ble : (0, l}^'^ x (0, 1}^^ — > (0, 1}™. 



4 Efficient Constructions 



4.1 Hadamard Matrices and Extraction of One Bit 

A class of 1-bit strong blenders which includes the inner product function is now 
considered, thus providing a strengthening of a result of Chor and Goldreich [6]. 
Identify [L\ = [2^] « {0,1}^ and let H = {Hxy\x y=i be a L x L Hadamard 
matrix (i.e. a ±1 matrix with pairwise orthogonal rows/columns). Define 

BLEff :{0 ,l}^x| 0,l}^ ^ {0,1} 



We shall prove the following two results. 

Theorem 3. Ble// as defined above is a ( 61 , 62 , e) -SB with log ^ + 1. 



Corollary 1. The inner product function on l-bit strings is a (61, 62, £)-SB with 
e as above. 



Proof, {of Corollary 1) Inner product is of the form Ble// for some Hadamard 
matrix H (as one can easily show). □ 



Proof, {of Theorem 3) The proof parallels that of the corresponding theorem in 
[6]. In particular, we also employ Lindsay’s Lemma. 

Lemma 1. (Lindsay’s Lemma cf. [6]) Let G = {Gij)J be aTxT Hadamard 
matrix, and R and C be subsets of [T] corresponding to choices of rows and 
columns of G (respectively). Then | I — ^/\^\ |C| T . 



For any choice of (gi, . . . , qif) G {—1, +1}^, the matrix H = {Hij) whose ith row 
is Qi times the zth row of H is Hadamard. Hence Lindsay’s Lemma applies and for 
all sets R,G C [L] the sum which is just {t ^ 

is bounded by jCj L. From this fact it is easy to deduce a stronger form of 

Lemma 1. 



Vi?,Cc[iV] ^ 






ieR 



j&c 



<V\R\\c\L 



( 2 ) 
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Now let (X,Y) G CG(£, &i, & 2 ) be flat random variables and assume that X is 

uniform on Sx, I'S'xl = 2^^ and Y is uniform on Sy, |5'y| = 2^^. Applying (2), 
one obtains precisely the desired inequality 



\\{Y,Bleh{X,Y)) - {Y,U,)\\ = ^ E l|BLEff(X,y) - U_ 



VGSy 



2|5f| 



E 

y^Sy 



\ '' Hxy 



< 



L 



2 y |5 'x||>S'y| 



= 2 ” 



bi+b2~i 



(3) 



4.2 Extracting Many Bits Using Error- Correcting Codes 

We now adapt a construction from [25] based on error-correcting codes to obtain 
many bits from weak sources of same length and sufficiently high min-entropy. 
In what follows Ecc : {0, 1}™ — > {0, 1}^ is a linear error correcting code with 
distance d, {ej : 1 < t < m} is the canonical basis of { 0 , 1 }™ as a vector space 
over Z 2 , and for (x,y) = {{xi, . . . . . . ,ye)) G {0,1}^ x {0,1}^ we let 

v{x, y) G {0, 1}^ be the vector whose ith coordinate is xiyi. The proposed SB is 

Ble:{0,1}^x{ 0,1}^ ^ {0,ir ^ 4 ^ 

(x, y) I — > (Ecc(ei) • t;(x, y)) o • • • o (Ecc(e^) • n(x,y)) 

Note that each bit that Ble outputs corresponds to the inner product of match- 
ing segments of the input strings x and y. We show in below that 

Theorem 4. The function Ble constructed above is a ( 61 , 62 , e)-SB with log - = 
1 -b - {e+m). 

There exist efficiently encodable linear codes of codeword length i, dimension 
771 = and distance d = {^ — S)£, for all fixed 0 < <5 < b. Plugging one 
such code into Theorem 4 yields an efficiently computable ( 6 i, 62 ,e)-SB with 
£ = £-“(1) for all min-entropies satisfying > (3/4 -b <5)^ -b w(log£), and the 

number of extracted bits is tti = 6^n. We prove Theorem 4 below, with the aid 
of the following two lemmas (the second being fairly standard) . 

Lemma 2. (Parity Lemma, [25]) For any t-bit random variable T, ||T — t7t|| 
is upper-bounded by I]aG{o,i}*\{o} ||(’^’“) “ 

Lemma 3. If Z = Z\Zi . . . Zt is a t-bit random variable and W C [t], let Z\w 
denote the concatenation of all Zi with i G W. Then Hao (^|w) > Hoo {Z) -t-\- 
\W\. 

Proof, (of Theorem 4) By Lemma 2 and some simple calculations, it suffices to 
show that for any a G {0, !}™\{0} 

||(r,(BLE(A,y).a)) - (r,Ui)|| < ^ = 2^-^^"^-! (5) 
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Fix some non-zero a = note that by the linearity of Ecc 



Ble(X, Y) • a = ^ «*(Ecc(ei) • v{X, F)) = Ecc(a) • v{X, Y) = (X|s) • (Y|s) 

( 6 ) 

where S is the set of all non-zero coordinates of Ecc(a), and Xjs and Yjs are 
defined as in Lemma 3. Applying that Lemma, we conclude that X|s (Y|s) has 
min-entropy at least bi — £ + \S\ (respectively 62 — + |<S'|). It now follows from 
(6), Corollary 1, Lemma 3 and the fact that X|s and Y|s have length IS”! that 



(Y,(BLE(A,y)-a)) 



/V- TT Ml . ,^ IS|-l»l-l»2+2g-2|S| ^ e tl+b2 + |S| 

[Y, Ul)\\ <2 2 ^ = 2 2 



( 7 ) 



Since [S'] = (weight of Ecc(a)) > d by definition of S, equation (7) proves (5) 
and finishes the proof. □ 



4.3 A Number-Theoretic Construction 



A third efficient SB construction is now presented. Its minimal min-entropy 
requirement is basically b\ + b 2 > which roughly matches the Hadamard 
matrix construction for 1-bit extraction. However, this SB has the drawback of 
requiring a pre-processing stage for efficiency to be achieved. The construction 
dates back to [6], in which it was shown that Ble(A, Y) is close to random. We 
claim that the same is true even if Y is given to the adversary, thus establishing 
that this construction satisfies our definition of SB. In what follows, p > 2 is a 
prime and we take I = [logpj so that we can assume {0, 1}^ C Zp. Let fe be a 
divisor of p — 1; our SB will output elements of Zj, (the definition of a SB easily 
generalizes to this case). Finally, let p be a generator of the multiplicative group 
Z* and denote by logg the base-p discrete logarithm in Z*. We define 

Ble:{0,1}^x{ 0,1}^ ^ Zfe 

(x, y) I — ^ logg(x - y) mod k 

We prove below that approximately m = log k « bi+b^-t. _ i 
extracted by this construction. 

Theorem 5. The function Ble defined above is a (6i,62,e)-SB with log - = 

bi+b2-^ 4.1-logfc. 

We refer to [6] for details on the efficient implementation of Ble and the pre- 
computation of p, k and p. 



Proof, (of Theorem 5) The following inequality (proven in [8]) holds for all sub- 
sets A,B,CC Zp: setting T>c = maxi<j<p_i | J2ceC ^ 



E 

aeA 



\B\\C\ 



#{&e B : a-6e C} 



P 



<<PcV\a\\b\ 



(9) 
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Assuming (9), choose a G and set C = {c G C | log^(c) = a mod k}. 
Following [6, Section 3.2], we note that jCj = p/k and <l>c < ^/p- Hence for all 
A,BC {0,1}^ C Zp 



E 

aeA 



#{&G B : logg{a-b) 



a} 



M 

k 



< VM\B\ 



( 10 ) 



We deduce from (10) that for any choice of flat random variables {X, Y) in the 
subset CG{ii,£ 2 ,bi,b 2 ) with respective supports Sx, Sy of sizes 2^^ 



||(F,BLE(A,y)) - {Y,U)\\ 

^ ■■ logg(a:^ -y) = a} 

“GZfc yeSy 

and this implies the theorem by Proposition 1. □ 



< 



P 

2&1+&2 



= £ ( 11 ) 



5 Simple Authentication with Weak Sources 

5.1 Motivation 

As noted in the Introduction, strong blenders trivially solve the problem of pri- 
vacy amplification over passive public channels with weak random sources are 
used. In this Section we provide a simple protocol PA for privacy amplification 
over an adversarially controlled channel, when only weak sources of randomness 
are available. Following [12], we show that weak sources can be used in conjunc- 
tion with the simple “aj/ -|- &” message authentication code (MAC) to transmit 
the non-secret input Y over the adversarial channel. 

In our simplified model. Bob can either be close (to Alice) or far (from Alice), 
and each one of them has a weak source (specified below). If Bob is close, they 
can share secret information, but their sources could be arbitrarily correlated. 
On the other hand, if Bob is far, the sources can be assumed to be independent, 
but only active adversarial communication channels are available. Bob’s source 
outputs a (.-hit long string Y with min-entropy FToo (X) > & 2 , and Alice’s source 
outputs three Obit strings A, B, X, which are assumed that they form a bi-block 
source [6]. That is, for any a,b G {0, 1}^, A,B\A=a and X\A=a,B=b all have min 
entropy at least bi. 

Our scenario differs from that of previous work on privacy amplification (e.g. 
[12]) in that Alice and Bob are not capable of sampling perfectly random bits. 
Moreover, geographical distance between the sources is necessary for indepen- 
dence, which is a reasonable assumption for physical and adversarial sources. 
Whereas in [12] (for instance) it is not clear that it would not be possible for the 
parties to agree on a perfectly random secret key when they meet in the first 
place, this is impossible in our case. Therefore, the need for privacy amplification 
is arguably better motivated in the present work. We also note that, although 
our assumption on Alice’s source is stronger than that on Bob’s source, it is still 
much weaker than the capability to generate truly random bits. 



262 Yevgeniy Dodis and Roberto Oliveira 



5.2 The Protocol 

Alice and Bob’s aim is to agree on a secret key S that is very close to being 
random from Eve’s point of view. This is achieved by the protocol PA which 
we now describe (see also Table 1 in [8]), in which we identify {0, 1}^ with the 
finite field F 2 ? for the purpose of arithmetic operations, and Ble : {0, 1}^ x 
{0,1}^ — > {0,1}™ is a function (we will later choose it to be a suitable SB). 
Briefly, Alice and Bob share (A,B,X) when Bob is close. Then Bob moves to 
far, samples, Y and sends Y, Z = AY + B to Alice. Eve then intercepts (Y, Z) and 
retransmits a possibly different pair (Y, Z) to Alice. Alice checks if AY + B = Z 
and, if this is satisfied, she computes S = Ble(A, Y), rejecting otherwise. In 
the meantime, Bob has computed S = Ble(A, Y). Theorem 6 (proven in [8]) 
shows that with high probability either S = S and Alice and Bob share a secret 
key, or else Alice has rejected. This is true as long as &i = | + ujilogl) and a 
(6i, & 2 J exists. For instance, the number-theoretic SB (Theorem 5) 

permits agreement on a key of length m « 

Theorem 6 . //Ble is a ( 6 i, 62 , e)-SB, the protocol PA has the following prop- 
erty. If Eve is passive, Alice never rejects, S = S and ||(Y, S') — {Y,Um)\\ < £• 
If Eve is active, the probability of either Alice rejecting or S = S and 

||(Y,S) - {Y,Um)\\<e 



is at least 1 — 2^ . 
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Abstract. The preferential attachment graph is a random graph formed 
by adding a new vertex at each time step, with a single edge which 
points to a vertex selected at random with probability proportional to its 
degree. Every m steps the most recently added m vertices are contracted 
into a single vertex, so at time t there are roughly t/m vertices and 
exactly t edges. This process yields a graph which has been proposed as 
a simple model of the world wide web [BA99]. For any constant k, let 
Ai > A 2 > • • • > Ak be the degrees of the k highest degree vertices. 
We show that at time t, for any function / with f{t) ^ cx3 as t — » 00 , 
< Ai < and for i = 2, . . . , fc, < Ai < Ai-i - 

with high probability (whp). We use this to show that at time t the 
largest k eigenvalues of the adjacency matrix of this graph have Aj, = 
(l±o(l))A^/^ whp. 



1 Introduction 

Recently there has been much interest in understanding the properties of real- 
world large-scale networks such as the structure of the Internet and the World 
Wide Web. For a general introduction to this topic, see Bollobas and Rior- 
dan [BR02], Hayes [Hay 00], or Watts [Wat99j. One approach is to model these 
networks by random graphs. Experimental studies by Albert, Barabasi, and 
Jeong [ABJ99], Broder et al [BKM+00], and Faloutsos, Faloutsos, and Faloutsos 
[FFF99] have demonstrated that in the World Wide Web/Internet the propor- 
tion of vertices of a given degree follows an approximate inverse power law i.e. 
the proportion of vertices of degree k is approximately Ck~°‘ for some constants 
C, a. The classical models of random graphs introduced by Erdos and Renyi 
[ER59] do not have power law degree sequences, so they are not suitable for 
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modeling these networks. This has driven the development of various alternative 
models for random graphs. 

One approach to remedy this situation is to study graphs with a prescribed 
degree sequence (or prescribed expected degree sequence). This is proposed as 
a model for the web graph by Aiello, Chung, and Lu in [ACLOO]. Mihail and 
Papadimitriou also use this model [MP02] in their study of large eigenvalues, as 
do Chung, Lu, and Vu in [CLV]. 

An alternative approach, which we will follow in this paper, is to sample 
graphs via some generative procedure which yields a power law distribution. 
There is a long history of such models, outlined in the survey by Mitzenmacher 
[MitOl]. We will use the preferential attachment model to generate our random 
graph. The preferential attachment random graph has been the subject of re- 
cently revived interest. It dates back to Yule [Yul25] and Simon [Sim55]. It was 
proposed as a model for the web by Barabasi and Albert [BA99], and their de- 
scription was elaborated by Bollobas, Riordan, Spencer, and Tusnady [BRSTOl] 
who proved that the degree sequence does follow a power law distribution. Bol- 
lobas and Riordan obtained several additional results regarding the diameter 
and connectivity of such graphs [BR]. We use the generative model of [BRSTOl] 
(see also [BR02]) and build a graph sequentially as follows: 

— At each time step t, we add a vertex vt, and we add an edge from vt to some 
other vertex u, where u is chosen at random according to the distribution: 



Pr[u = Vi] 



dtjvi) 
2t-l ’ 
1 

2t-l ’ 



if Vi ^ Vt] 
if Vi = Vt] 



where dt{v) denotes the degree of vertex v at time t. This means that each 
vertex receives an additional edge with probability proportional to its current 
degree. The probability of choosing vt (and forming a loop) is consistent with 
this, since we’ve already committed “half” an edge to Vt and are deciding 
where to put the other half. 

— For some constant m, every m steps we contract the most recently added m 
vertices to form a super vertex. 

Let G™ denote the random graph at time step t with contractions of size m. 
Note that contracting each set of vertices {im+ l,im + 2, . . . , {i + l)m} of Gl 
yields a graph identically distributed with G™. 

It is worth mentioning that there are several alternative simple models for 
the World Wide Web and for general power law graphs. A generalization of the 
preferential attachment model is described by Drinea, Enachescu, and Mitzen- 
macher in [DEMOl], and degree sequence results analogous to [BRSTOl] are 
proved for this model by Buckley and Osthus in [BOOl]. A completely different 
generative model, based on the idea that new webpages are often consciously or 
unconsciously copies of existing pages, is developed by Kleinberg et al and Ku- 
mar et al in [KKR+99], [KRRT99], [KRR+OOb], [KRR+OOa]. Cooper and Frieze 
analyze a model combining these approaches in [CFOl]. 
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The results in previous papers on preferential attachment graphs concern low 
degree vertices. For example the results in [BRSTOl] concern degrees up to . 
Our firt theorem deals with the highest degree vertices: 

Theorem 1. Let m and k be fixed positive integers, and let f(t) he a funetion 
with f[t) —>■ oo as t oo. Let A\ > A 2 > ■ ■ ■ > Ak denote the degrees of the k 
highest degree vertices of G'ff . Then 



^1/2 

IW) 



<Ai< t^/^f{t) 



and for i = 2, . . . ,k, 



whp^. 



m 



<Ai< Ai_i - 



t^ 

M’ 



The next theorem relates maximum eigenvalues and maximum degrees. It mir- 
rors results of Mihail and Papadimitriou [MP02] and Chung, Liu and Vu [CLV] 
for fixed degree expectation models and at a high level, the proof follows the 
same lines as these two papers. Experimentally, a power law distribution for 
eigenvalues was observed in “real-world” graphs in [FFF99]. 

Theorem 2. Let m and k be fixed positive integers, and let f(t) he a function 

with f{t) -^00 as t ^ GO. Let Ai > A 2 > • • • > Afc he the k largest eigenvalues of 

1 /2 

the adjacency matrix of G™. Then for i = 1, ... ,k we have Xi = (1 ± o(l))Z\ / 

whp. 



Our proofs of these theorems require two lemmas. 



Lemma 1. Let dY^(s) denote the degree of vertex s in . Then for any positive 
integer k. 



E 









fc /2 



To simplify the exposition, we speak of a supernode, which is simply a col- 
lection of vertices viewed as one vertex. So the degree of a supernode is the sum 
of the degrees of the vertices in the supernode, and an edge is incident to a 
supernode if it is incident to some vertex in the supernode. 



Lemma 2. Let S = {Si, S 2 , ■ . ■ , S(,) be a collection of disjoint supernodes, and 
let ps(r; d,to,t) denote the probability that each supernode Si has degree ri + di 
at time t conditioned on dt^ {Si) = di. Let d = Y^i^idi and r = 
d = and r = o{t^/^), then 



ps{v,d,to,t) < 



(nC-:id‘ 




d/2 

exp 




d 

2 




In the next section we prove Theorems 1 and 2. The proofs of Lemmas 1 and 
2 are too long to fit in here and we leave them for the final version. 



^ In this paper an event £ is said to hold with high probability (whp) if Pr[£’] — > 1 as 
t — > 00. 
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2 Proof of Theorems 

2.1 Proof of Theorem 1 

We partition the vertices into those added before time to, before ti, and after ti 
and argue about the maximum degree of vertices in each set. Here 

to = log log log /(t) and h = log log /(t). 

We break the proof of Theorem 1 into 5 Claims. 

Claim. In G™ the degree of the supernode of vertices added before time to is at 
least ty^t^/^ whp. 

Proof Let Ai denote the event that the supernode consisting of the first to 
vertices has degree less than t^^t^/^. We bound the probability of this event 
using Lemma 2 with £ = 1. Since at time to the supernode of all vertices added 
by this time has all of the edges, we take d = cti = 2to. Then 



Proof Let A 2 denote the event that some vertex added after time ti has degree 
exceeding Then we have 




g3to+2tJ^^+2 



“ (2to - 
= 0 ( 1 ). 



□ 



Claim. In G™ no vertex added after time t\ has degree exceeding tg ^t^/^ whp. 



1/ L “ 

Pr[A2] < Pr[dt(s) > 






Using Lemma 1 this bound becomes 



P4A2] < Y 




< = o(l). 



□ 
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Claim. In G™ no vertex added before time G has degree exceeding whp. 

Proof Let ^3 denote the event that some vertex added before ti has degree 
exceeding Then by using Lemma 1 for a third moment argument as 

above we have 

il /A 3/2 

Pr[yl3] < 

= m32732^-l/2 ^ ^-3/2 < ^32734^-1/2 ^ 

S = 1 



□ 

Claim. The k highest degree vertices of G™ are added before time ti and have 
degree Ai bounded by tg ^^ 1/2 < /\- < whp. 

Proof 

(Upper bound on Ai) By Claim 2, all vertices added after time fi have degree 
at most whp. Combining this with Claim 3 we have Z\i < 

whp. 

(Lower bound on Ai) The conditions from Claims 1,2, and 3 imply the lower 
bound. To see this, suppose the conditions of these claims are satisfied, but 
assume for contradiction that at most k — 1 vertices added before ti have 
degree exceeding tg ^^ 1/2 Then the total degree of vertices added before to 
is less than + to(/o < 2kt}^^Cl'^ . But this contradicts the 

condition of Claim 1, which says the total degree of vertices added before to 
at least 

(Added before ti) By Claim 2 all vertices added after time ti have degree 
at most whp. So the lower bound on Ai shows the k highest degree 

vertices are added before time G whp. 



□ 

Claim. The k highest degree vertices of G™ have Ai < Ai-i — C/‘^lf{t) whp. 

Proof Let A4 denote the event that there are 2 vertices among the first G 
with degrees exceeding and within t^^^lf{t) of each other. 

Let pe.si,s2 = Pr[dt(si) - dt(s 2 ) = ^ \ A3], for \i\ < Vt/f(t). Then 

Pr[Pl4 I PI 3 ] < E E 

1<S1<S2<P fe-tl/2//(i) 



High Degree Vertices and Eigenvalues in the Preferential Attachment Graph 269 



Since 






W,Sl,S 2 ^ E E 

di,d2^1 



{di-\-d2)/2 



< t'j’t 



1/6, 1/2 / /H- 

° ..iiiWi-1 JU 2-1 jv i 



to+2+2ty® 



=!i,d2 = l 






we have 



H/V/P) 

Pr[Al4 I ^ 3 ] < E E W.si.^2 = o(l)- 

1<S1<S2<P i=-t^/2jf{t) 



So 



Pr[A] = Pr[A I ^3]Pr[^3]+Pr[^4 | -4a] Pr[-43] < Pr[^3]+Pr[^4 | ^ 3 ] = o(l). 

□ 



2.2 Proof of Theorem 2 

We partition the vertices into 3 sets; let Si be the vertices added after time ti-i 
and at or before time ti, for 

to = 0, = t3 = t. 

To reduce the number of subscripts necessary, we use G to denote the graph Gt- 
For any graph H, we let Mh denote the adjacency matrix of H, and we 
let Xi{H) denote the i-th largest eigenvalue of Mh- We will use the identity 
{Rayleigh’s Principle) 

^ izj\ ■ v'^Mhv 

= min max ( 1 ) 

L vGL,Vi^0 V 

where L ranges over all (n— i+l)-dimensional subspaces of K". (See, for example, 
[Str88]). 

Our approach, as in [MP02], [CLV], is to show that whp G contains a star 
forest F with stars of degree asymptotic to the maximum degree vertices of G. 
Then we will show G \ F has small eigenvalues, and conclude that the large 
eigenvalues of G cannot be too different from the large eigenvalues of F. 

To do this, we need reasonable bounds on the degrees and codegrees in G. 
Recall that dJ^{r) is the degree at time s of the vertex added at time r with 
contractions of size m. 
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Claim. For any e > 0 and any f{t) with f{t) — > cx) as t — *■ oo the following holds 
whp: for all s with f{t) < s < t, for all vertices v G G™, if v was added at time 
r, then d^{v) < si/2+ej,-i/2^ 

Proof We use Lemma 1 and the union bound. Let £ = [3/e] . 



Pr 



s=/(i) r=l 

t S 

< J2^r[dT{r)> 

s=f{t) r=l 

t s 



sl/2+V-l/"l 



^ ^Pr[(C(r))^> (s 



l/2+e^-l/2 



s=f{t) r=l 

t s 



s=f{t) r=l 

s=f{t) r=l 



= SmV 

S = /(‘) 



Since i> 8 le, 



E 

s=/(t) 






poo 

/ 



dx = 



1 



(/(t)- 1)2-^ = 0(1). 



t £-2 



□ 

Claim. Let S'^ be the set of vertices in S 3 which are adjacent to more than 1 
vertex of S\ in G. Then IS'gj < whp. 

Proof Let Bi be the event that the conditions of Claim 2.2 hold with f{t) = £2 
and e = 1/16. Then for a vertex v G S 3 added at time s, 

Pr[|W(«) n S,| > 2 I B,1 < (“) 

Let X denote the number oi v G S 3 adjacent to more than 1 vertex of S±. Then 
E[X\Bi]< ^ t x-^l^dx<8mh^l^. 

S=i2 + 1 
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We finish the claim with Markov’s inequality, 

Pr[X > I Bi] < E[X I = o(l). 



□ 

Now, let C G be the star forest consisting of edges between Si and Ss\S'^. 

Claim. Let Ai > A 2 > ■ ■ ■ > Ak denote the degrees of the k highest degree 
vertices of G. Then Xi{F) = (1 — whp. 

Proof Let H be the star forest H = U ATi,d 2 U • • • U Ki^dk-i with 

1 /2 

d\ > d ,2 > ■ ■ ■ > dk- Then for f = 1, . . . , fc, Xi{H) = d/ . So it is sufficient to 
show that Ai{F) = (1 — o{\))Ai{G) for z = 1, . . . , k. 

Claim 2.1 shows that the k highest degree vertices of G are added before 
time ti, so these vertices are all in F. The only edges to these vertices that are 
not in F are those added before time t2 and those incident to S'^. By Theorem 
1 we have Z\i(G^) < and, also by Theorem 1, Ai{G) > AAjiQgf 

for z = 1, . . . , fc, whp. Claim 7 says that whp jS'^l < and so whp 

A,{F) > A{G) - - mfA^ = (1 - o(l))A(G). 



□ 

Let H = G\ F. We complete the proof of Theorem 2 by showing that Xi{H) 
is small. 

Claim. Xi{H) < whp. 

Proof We bound the eigenvalues of id in 6 parts. Let 
A = H[S,], A, = H{S,,Sj), 

where id [S'] is the subgraph of H induced by the vertex set S, and H{S,T) is 
the subgraph containing only edges with one vertex in S and the other in T. 

To bound Xi{Hi) we use the fact that the maximum eigenvalue of a graph is 
at most the maximum degree of the graph. This is easily verified from (1). 

We use Claim 6 with f{t) = G and e = 1/64 to conclude that whp 



Ai(i7i) 




= max{d™(z;)} 

V<ti 


+ 

VI 


_ ^33/512 


Ai(i72) 


< Ai{H2) 


< max {dT^} 




_ ^233/1024 












Ai(i73) 


< AiiHs) 


< max {dAv)} 
t2<V<A ^ 




_ ^15/64 



To bound X\{Hij), we begin by considering the case m = 1. Then, for i < /, 
each vertex in Sj has at most 1 edge in Hij, so Hij is a star forest. As observed 
in Claim 8, the eigenvalues of a star forest are directly related to the degrees of 
the stars. 
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When m > 1, we let G" denote a preferential attachment graph with t edges 
and 771 = 1. Recall that by contracting vertices {(* — 1)?77 + 1, . . . ,im} into a 
single vertex i, we obtain a graph identically distributed with G. There is a 
simple representation of this observation in terms of linear algebra: we can write 
the adjacency matrix of G in terms of the adjacency matrix of the graph G': 

Mg = G^Mc'Cm, 

where Cm is the txt/m matrix with i-th column 

[0 •• • 0 1 • •• 1 0 • •• 0 

'm {tjm—i)m 

Similarly, we can write the adjacency matrix of Hij in terms of the adjacency 
matrix of H[j using this “contraction matrix” Cm- 

Note that for w = CmV we have w'^w = m{v^v). So 

v'^MHi.v v’^C'^Mn'CmV w'^Mh'W 

Xi(Hij) = max = = max = max m — 

v^O V v^O V w. w=Cmv¥=0 W 

w’^Mh'.w 

< mmax =7-^^ — = mXi(H'A. 

~ w^O W^W ^ 

We use Claim 6 with f{t) = ti and e = 1/64 as above to conclude that whp 

^ 1 (^ 12 ) = (n)} < _ ^297/1024 

= max {4 (t;)} < 

Finally, all edges in are between Si and S'3, so Claim 7 shows that Ai{H[^) < 
We now conclude that whp 

Ai(iJy) < mAi(i7b) < mAiinXy/^ < 



and so whp 



3 

Ai(7/) < E Ai(iJ,) + ^ Xi{H,j) < 6mfi5/64_ 

i=l i<,j 



□ 
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Abstract. We study two natural models of randomly generated con- 
straint satisfaction problems. We determine how quickly the domain size 
must grow with n to ensure that these models are robust in the sense 
that they exhibit a non-trivial threshold of satisfiability, and we deter- 
mine the asymptotic order of that threshold. We also provide resolution 
complexity lower bounds for these models. 



1 Introduction 

The Constraint Satisfaction Problem (CSP) is a fundamental problem in Ar- 
tificial Intelligence, with applications ranging from scene labeling to scheduling 
and knowledge representation. See for example Dechter [12], Mackworth [18] and 
Waltz [26]. An instance of the CSP comprises a set of n variables, each taking 
a value in some given domain, and a set of constraint relations, each of which 
determines the permitted joint values of a given subset of the variables. The 
problem is either to determine any set of values for the variables which respects 
all the constraint relations, or determine that none exists. In recent years, there 
has been a strong interest in studying the relationship between the input pa- 
rameters that define an instance of CSP (e.g. number of variables, domain sizes, 
tightness of constraints) and certain solution characteristics, such as the likeli- 
hood that the instance has a solution or the difficulty with which a solution may 
be discovered. An extensive account of relevant results, both experimental and 
theoretical, can be found in Hogg, Hubermann and Williams [15]. 

One of the most commonly used practices for conducting experiments with 
CSP is to generate a large set of random instances, all with the same defining 
parameters, and then for each instance in the set to use heuristics for deciding if 
a solution exists. Note that, in general CSP is NP-complete. The proportion of 
random instances that have a solution is used as an indication of the likelihood 
that an instance will be soluble, and the average time taken per instance (by 

* Supported in part by NSF grant CCR0200945. Research carried out during a visit 
to the Microsoft Research, Theory Group 

S. Arora et al. (Eds.): APPROX 2003-1- RANDOM 2003, LNCS 2764, pp. 275-289, 2003. 
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some standard algorithm) gives some measure of the hardness of such instances. 
A characteristic of many of these experiments is that the fraction of assignments 
of values that are permissible for each constraint is kept constant as the number 
of variables increases. The very active experimental study of random models 
of CSP has necessitated a rigorous analysis of such models. Various models of 
random CSP’s for which m, the domain-size, is constant have been studied in 
several papers, for example [2,21,11,22,23]. One of the earliest such studies, [2] 
discovered that the most natural models suffer a fatal flaw (described below). 
The first study of the case where m grows with n was [13], where one of these 
most natural models was studied. Implicit in that study was the fact that for 
certain settings of the relevant parameters, the fatal flaw did not occur and we 
had a rich random model to study. One the main contributions of this paper is 
to determine which parameter settings avoid that fatal flaw, and thus provide 
random models that are both natural and robust. 

In this paper we consider only binary CSPs (BCSPs). These can be suc- 
cinctly described in the following way: A graph G = {V^E) is given, where 
V = {xi, X 2 , . . . , Xn} denotes the set of variables of the problem, and E the set 
of binary relations of the instance. We assume, without loss of generality, that 
each variable can take values in the same set [m] = {1,2,..., m}. For each edge 
e = {xi,Xj} € E, the relation can then be represented by an m x m 0-1 matrix 
Mg, where 0 indicates that the pair of values is forbidden and 1 that it is allowed. 
A solution to the associated BCSP is an assignment / : V — > [m] of values to 
the variables, such that Mf.{f{xi),f{xj)) = 1 for all e = {xi,Xj} € E. 

The aim of this paper is to conduct a probabilistic analysis of some aspects 
of the following simple random models of BCSP: 

Model A: The underlying graph G is Gn,pi for some pi = pi{n) < 1 where 
Pi yf o(l/n). (This means that, with V = {xi, X 2 , . . . , x„}, we let each of the ( 2 ) 
possible edges occur independently in E with probability pi.) We let d = npi. 
For each edge e of G there is a random m y. m constraint matrix Mg where 
= 1 or 0 independently with probability p 2 or (?2 = 1 — P 2 respectively, 
for some constant 0 < p 2 < 1- (In the final paper we will consider P 2 0 and 
P 2 — > 1 as well.) 

For Pi = o(l/n), the graph Gn,pi is very sparse, and consists of a collection 
of small vertex-disjoint trees in which all but o(n) of the vertices have degree 0. 
This is why we restrict our attention to pi yf o(l/n). 

Given m, p 2 we wish to know: for what values of pi is our random CSP almost 
surely satisflable? This question has been asked for many similar models of CSP, 
SAT and other problems. Traditionally, one of the first steps is to determine 
some values of pi for which it is not satisflable as follows: 

Fact: For pi > the random CSP is unsatisflable whp. 

The proof follows easily from the fact that the expected number of satisfying 
solutions is m” (1 — Piq 2 )^^^ ■ 

Inspired by a familiar pattern of similar random models, it is tempting to 
assume that is the asymptotic order of a so-called ’’satisfiability threshold” 
and so hypothesize that: 
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Hypothesis A: There is some constant c > 0 so that for pi < the 

random CSP is satisfiable whp. 

See [16] for a lengthy list of papers in which the authors fell to the temptation 
of assuming an equivalent hypothesis. In [2], it was observed that for most of 
those papers, and in fact whenever m,p 2 are both constants, the hypothesis is 
wrong. In fact, if p\ > ijj(n)jr?' for any w(n) that tends to infinity with n, then 
almost surely the random CSP is trivially unsatisfiable in the sense that it has 
an edge whose constraint forbids every pair of values; we call such an edge a 
blocked edge 

In this paper we asymptotically determine which values of m meet Hypothesis 

A. 

Theorem 1. (a) If m < {1 — e) y^ln nd/ ln( 1 /g 2 ) for some constant e > 0, then 
provided nd — > oo, the random CSP has a blocked edge whp 
(b) If m > {1 + e) -\/ln nd/ ln(l /q 2 ) then there is some constant c > 0 so that for 
Pi ^ 0 ^^^, the random CSP is satisfiable whp. Furthermore, an assignment 
can be found in Ofmn) time whp. 

For m,p 2 as in case (b). Hypothesis A holds, and so is, indeed, the order 
of the satisfiability threshold. In case (a), whp the fact that the random CSP is 
unsatisfiable can be demonstrated easily by examining a single edge. We show 
that for m > (lnn)^+'^ for any e > 0, this is far from the case. In particular, we 
show that whp there is. no short resolution proof of unsatisfiability when p\ is 
of the same asymptotic order as the threshold of satisfiability. 

Theorem 2. Ifm> (Inn)^^'^, d = clnm, for any constants e,c > 0, then whp 
the resolution complexity of the random CSP is 

The resolution complexity of various models of random boolean formula has 
been well-studied, starting with [10], and continuing through [4], [5], [3] and other 
papers. This line of inquiry was first extended to random models of CSP in 
[20,19] and was then continued in [23]. In both of those studies, the domain-size 
was constant. Our Theorem 2 is the first result on the resolution complexity for 
a model of random CSP where the domain-size grows with n. 

We now consider another model. 

Model B: Here we generate a random m x m symmetric matrix M with 
density p 2 and put Mf, = M for every edge of G = Gn,pi ■ 

Theorem 3. Let e be a small positive constant, and consider a random CSP 
from Model B. 

(a) If d < (4 — e)(ln(l/q 2 ))~^ In mlnlnm then wlip the CSP is satisfiable vrYip . 

(b) Ifd < (1 — e)(ln(l/g 2 )) ^Inmlnlnm then an assignment can be found in 
polynomial time whp. 

(c) If 0 < Q 2 < 1 is constant and if d > Klnm In In m for sufficiently large K 
then whp the CSP is unsatisfiable. 

We can prove high resolution complexity in a restricted range of d,m,p 2 - 

Theorem 4. // m — > oo and d = clnmlnlnm for some constant c > 0, then 
whp the resolution complexity of a random CSP from Model B is ™)). 



278 Alan Frieze and Michael Molloy 



2 Model A: Unsatisfiable Region 

2.1 Blocked Edges and Vertices 

Let an edge e = {x, y) of G be blocked if Mg = O (the matrix with all zero 
entries). Of course, any CSP with a blocked edge is unsatisfiable, since there is 
no possible consistent assignment to x, y. We start with a simple lemma: 

Lemma 1. Let e > 0 be a small positive constant and assume that nd — > oo (so 
that whp G has edges). Let mo = (In n + In d)/ ln(l / 52 ) • Then 

(a) m > (1 + e)mo implies that there are no blocked edges, whp. 

(b) m < (1 — e)mo implies that there are blocked edges, whp. 

Proof Let Z be the number of blocked edges in our instance. Given the 
graph G, the distribution of Z is Bin{\E\,ql^ ). 

E{Z) = ( 1 ) 

If m > (1 + e)mo then (1) implies that 

E(Z) < (nd)-^ 0 

and then Z = 0 whp and (a) follows. 

If m < (1 — e)mo then (I) implies that 

E(Z) > \{ndy 00 . 
o 

Part (b) now follows from the Chernoff bounds. 

This proves Theorem 1(a). □ 

We now consider another simple cause of unsatisfiability that [2] also discov- 
ered to be prevalent amongst the models commonly used for experimentation. 
We say that a vertex (variable) x is blocked if for every possible assignment 

i G [m] there is some neighbour y which blocks the assignment of i to x, because 

the ith row of Mg, e = (x, y) is all zero. 

Lemma 2. Let e be a small positive constant, and suppose that 
m — y^ln n/ ln(l/g 2 ) — *■ 00 . Then 

(a) m > (1 -|- e) (In n + mixed) / ln(l/ 52 ) implies that there are no blocked ver- 
tices, whp. 

(b) m < (1 — e) (In n mind)/ hr(l/ 52 ) implies that there are blocked vertices, 

whp. 

Remark: Note that m = a/ (I n n -I- min d)/ ln(l/q 2 ), for m slightly smaller 
than mo from Lemma 1. 

Proof If the graph G is given and vertex v has degree d„ then 
Pr(x is blocked | G) = (1 - (1 - g™)'^”)™. 
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This is because for i G [m], (1 — is the probability that no neighbour w of 

V is such that row i of is all zero. 

Part (a) now follows from an easy first moment calculation, which we omit. 
We turn our attention to proving part (b). Rearranging our assumption yields 
Ind > (1 — e)“^(mln(l/(jf 2 ) — ^Inn). So we choose d such that Ind =(1 — 

e)“^(mln(l/( 3 ' 2 ) ^Inn), i.e. d = as proving the result for 

that value of d clearly implies that it holds for all larger values. 

Our assumption implies that d oo and so whp n — o(n) vertices v have 
dv € I = — e)d, (1 + e)d\. Thus if Z is the number of blocked vertices with 

dy £ I then 

E{Z) > (n - o(n))(l - (1 - > (n - o(n))(d(l - 

> (1-0(1)) (92-’" n) (1-e)™ 

> (1 — o(l))n'^'^^^“'^^(l — e)'"“ (see the Remark preceding this proof) 

> > 00. 

To show that Z ^ 0 whp we use Talagrand’s inequality [25]. We condition on 
G. Then we let each l?e,e G be an independent copy of {0, 1}™ (the set of 
m X m 0-1 matrices). Now changing a single Me can change z by at most 2 and 
so Assumption 1 holds with a = 2. Then to show that a vertex v is blocked we 
only have to expose Me for e incident with v. Thus Assumption 2 holds with 
c(^) = (1 -b e)d^. Thus if M = Med{Z), the inequality gives 

Pr(|Z -M\> + e)dM^^^) < (2) 

for any f > 0. 

Our assumptions imply that = o(E(Z)) and so (2) implies the result. □ 



3 Model A: Satisfiable Region 



We assume for this section that 



m = (1 -b e) 




d= clnm and p 2 is constant 



where c, e are small. (Note that this also implies the result for larger m). 

Now let a vertex v be troublesome if it has degree > D = lOd or there are 
assignments to its neighbours which leave v without a consistent assignment. Let 
T denote the set of troublesome vertices. A subset of T is called a troublesome 
set. 



Let A be the event that every set of fcp vertices contains at most fcg edges 
where 

2 In n 

kn = 



d 
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Then 



Lemma 3. 
Proof 

Pr(:4) < 



Pr(yf) = 1 -o(l). 






ko) v^o + 1/ 



feo+1 



< 



ne 

ko J 



y\(i\ 



Aiq + I 



fcpe \ 

^ 2feo + l(^fco + l d 



AlQ + l 









□ 



We show next that whp the sub-graph induced by T has no large trees. 



Lemma 4. Whp there are no troublesome trees with > kg vertices. 

Proof If T contains a tree of size greater than then it contains one of 
size /cq. Let Z be the number of troublesome trees with fcp vertices. Let 17 be 
the set of trees/unicyclic graphs spanning [fco]- Then for any subset J of [fep] we 
may write 



E(Z-U) < 




J]^Pr(a;i gT \ Gt,Xj G T, Vj € J,j < i). (3) 

iGJ 



Here Gt is the event that the sub-graph of G induced by[fco] is T. 

Fix T G fi and let I\ be the set of vertices of T with degree at most 4 in 
T. Then |/i| > fco/2. Note next that Ii contains an independent set I of size at 
least fco/10. 

Now ii i G I then 

Pr(a:i € T | Qt,xuX 2, ■ ■ ■ ,x^-i G T) < -t- ^ m*(l - 

The first term bounds the probability that Xi has at least D — A neighbours 
outside the tree and assuming the degree of Xi is at most D, the second term 
bounds the probability that the < D neighbours have an assignment which can 
not be extended to Xi. We use the fact that I is an independent set to gain the 
stochastic independence we need. 

Thus, applying (3) with J = I we obtain 



E(Z-U) 




□ 



Now we deal with troublesome cycles in a similar manner. 
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Lemma 5. Whp there are no troublesome cycles. 

Proof It follows from Lemma 4 that we need only consider cycles of length 
fco or less. If Z now denotes the number of troublesome cycles of length fcg or 
less then arguing as in (3), (4) we see that 



E(Z) < 




D 






[k/2] 



= 0 ( 1 ). 



□ 

Let a tree be small if it contains at most fco vertices. 

We have therefore shown that whp the troublesome vertices T induce a 
forest of small trees. 

We show next that whp there at most small trees. 

Lemma 6. Whp there are at most small trees. 



Proof Let ar denote the number of small trees. Then 



ko 

e(ctt) = 



uAi-2 



k-1 



ko 



< ^n{def = 



, 1 + 0 ( 1 ) 



fc=i 



The result now follows from the Markov inequality. □ 

Our method of finding an assignment to our CSP is to (i) make a consis- 
tent assignment to the vertices of T first and then (ii) extend this assignment 
“greedily” to the non-troublesome vertices. 

It is clear from the definition of troublesome that it is possible to carry out 
Step (ii). We wish to show that (i) can be carried out successfully whp. For this 
purpose we show that whp G does not contain a small tree which cannot be 
given a consistent assignment. 

So we fix a small tree T and a vertex v G T and root T at v. Then let 
Xi,0 < i < kg denote the vertices at distance i from v in T. Then let de be the 
maximum number of descendants of a vertex in Xg and let L denote the depth 
of T. 

For u G X( let Se{u) be the the set of values 5 such that there is a consistent 
assignment to the sub-tree of T rooted at u in which u receives S. We let t = 
[10/e] and define the events 

Then for 1 < i t let 



TTii = max Pr 




Note that = 1. 
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We claim that for £ > 1, 






iH 'rkt—di 

< - (1 - 



( 5 ) 



t=i 

t 



i=i 



( 6 ) 



Explanation of (5): Suppose that there are kj descendants w of u for which 
^w~,j occurs. If u G then r assignment values will be forbidden to it, < 

^ < Ini+ijTi. The product bounds the probability that these values are forbidden 
and that occurs for the corresponding descendants. 

Now let us prove by induction on £ that for r] = e/3 and for 1 < j < t we 
have 

TTjj < . (7) 

This is clearly true for £ = 0 since = 0 for j < £ and nt^o = 1. Then from (6) 
we obtain 






,.-rE 

i=i 






92 



— ’ 






< ^ n- (i+i)-^T^(i+^) . 

i=i 

Notice that in going from the first to second inequality we use the fact that since 

£, di < ko we find that ™ This term is then absorbed by using 

1 + e/2 in place of 1 + e. 

Now consider the expression 



^ ^ + 2 ^ + 

(7 — 1)(£ — i) e, i — j,^ s 
= ^ + 2 ^ + + ^)- 

To complete the inductive proof of (7) we have only to show that it is non- 
negative. 

Now A is clearly non-negative if i > j and so assume that j > i- Now for a 
fixed /, A can be thought of as a linear function of i and so we need only check 
non-negativity for z = lori = / — 1. 

For 7 = 1 we need 

(j-l)(£-l)(l+|)>(j-l)£(l + 7y) (8) 



and this holds for e < 1. 
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For * = j — 1 we need 

But here j >2 and the LHS is at least (t — 1)(1 + |) and the inequality reduces 
to (8) (after dividing through by j — 1). This competes the proof of (7). In 
particular 



Pr(3a troublesome tree which cannot be consistently assigned) 

< o(l) + ^ 



which implies that Step (i) can be completed whp. This proves the satisfiability 
claim in Theorem 1(b). 

It only remains to discuss the time to find an assignment. Once we have 
assigned values to T then we can fill in an assignment in 0{mn) time. So let us 
now fix a small tree T of troublesome vertices. Choose a root v G T arbitrarily. 
Starting at the lowest levels we compute the set of values Sg{u) available to a 
vertex u G Xi. For each descendant w of m we compute Ti{w) = {a G St,+i{w) : 
^{u,w){o) = 1} and then we have Si{u) = At the leaves, Sl = [m] 

and so in this way we can assign a value to the root and then work back down 
the tree to the leaves giving an assignment to the whole of T. Thus the whole 
algorithm takes 0{mn) time as claimed. □ 

4 Model A: Resolution Complexity 

For a boolean CNF-formula F, a resolution refutation of F with length r is 
a sequence of clauses C\,...,Cr = 0 such that each Ci is either a clause of 
F, or is derived from two earlier clauses Cj , Cji for j, j' < i by the following 
rule: Cj = (A V x),Cj> = {B V x) and Ci = {AV B), for some variable x. 
The resolution complexity of F, denoted RES(F), is the length of the shortest 
resolution refutation of F. (If F is satisfiable then RES(F) = oo.) 

Mitchell [20] discusses two natural ways to extend the notion of resolution 
complexity to the setting of a CSP. These two measures of resolution complexity 
are denoted C — RES and NG — RES. Here, our focus will be on the C — RES 
measure, as it was in [19] and in [23]. 

Given an instance X of a CSP in which every variable has domain {!,..., m}, 
we construct a boolean CNF-formula CNF(X) as follows. For each variable x 
of X, there are m variables in CNF(X), denoted x : l^x : 2,...,x : m, and 
there is a domain clause (a: : 1 V ... V cc : m). For each pair of variables x,y 
and each restriction (i,j) such that M(^^ y'^(i, j) = 0, CNF(X) has a conflict 
clause {x : iV y : j). We also add (™) 2-clauses for each x which specify that 
X : i can be true for at most one value of T It is easy to see that CNF(X) has 
a satisfying assignment iff X does. We define the resolution complexity of X, 
denoted C — RES(X) to be equal to RES(CNF(X)). 
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A variable x is free if any assignment which satisfies I — x can be extended 
to a satisfying assignment of I. The boundary B{X) is the set of free variables. 
We extend a key result from [20] to the case where m grows with n: 

Lemma 7. Suppose that there exist sfy > 0 such that 

(a) Every subproblem on at most s variables is satisfiable, and 

(b) Every subproblem I’ on v variables where < v < s has |yB(X')| > C,n. 

then C-RES(X) > 

The proof is a straightforward adaptation of the proof of the corresponding 
work in [20] and so we omit it. 

We assume now that e is a small positive constant and 

m > (Inn)^^'^, d = clnm and p 2 is constant. (9) 

Let 7 be a sufficiently small constant. Let 7i denote the set of vertices v for 
which there are 'yd neighbours W and a set of assignments of values to W for 
which V has no consistent assignment. 

Lemma 8. 

Pr(Ti fy 0) = o(l). 

Proof 



n—1 



E(lTil)<n^ 

t=^d 



t J \n J Xjd 

n—1 

<n^ 



n—1 / 1 \ t 

de 



/ tem\'^^ 



< ne 



P-./2 



t—'jd 
lOd 



t J \ ^d J 



-yd 



(de)^“( 10 e 7 -^m)^‘^ + | = o(l). 

it—'fd lOd 



□ 



Now we show that whp every set of s < sq = ctn vertices, 0 = 7/8 has less 
than 'jdsf2 edges. Let B denote this event. 



Lemma 9. 
Proof 

an 

Pr(B) < ^ 

s—'fd 



Pr(B) = 1 - 0 ( 1 ). 



( (“) 

\jds/2 



■ydsl2 an 
s—-yd 



se 

jn 



-l+jd/2 ^ 2 ' 



= 0 ( 1 ). 



□ 
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Let us now check the conditions of Lemma 7. Condition (a) holds because 
Lemma 9 implies that if s = [S'! < an then we can order S as vi,V 2 , ■ ■ ■ -,Vs 
so that Vj has less than ad neighbours among vi,V 2 , ■ ■ ■ ,Vj-i for 1 < j < s. 
Because we can assume that 71 = 0 (Lemma 8) we see that it will be possible 
to sequentially assign values to vi,V 2 , ■ ■ ■ ,Vg in order. Lemma 9 implies that at 
least i the vertices of S have degree < ad in S and now 7i = 0 implies that (b) 
holds with C = 1/2. 

We conclude that with the parameters as stated in (9), C — RES(X) is whp 
as large as is claimed by Theorem 2. 



5 Model B: Satisfiability 

We have a blocked edge iff M = O and this happens with probability 
and so there is not much more to say on this point. 

Secondly, if M yf O then there are two values x, y which can be assigned to 
adjacent vertices. This implies that for any bipartite subgraph H oi G there is 
a satisfying assignment for H just using x,y. So, in particular there will be no 
blocked vertices. 

Let us now consider Theorem 3. Let H be the graph defined by treating 
M as its adjacency matrix. Thus H = Gm,p 2 - As such it has a clique / of size 
(2 - o(l))lnm/(lnl/g 2 ). 

If we can properly colour G with I (i.e. give adjacent vertices different values 
in I) then we will have a satisfying assignment for our CSP. Now the chromatic 
number of G is (1 + o(l))d/(2 In d) whp. So the CSP is satisfiable whp if 

(2 — o(l)) In m/(ln 1 /( 72 ) > (1 + o{l))d/{2lnd) 

and this holds under assumption (a). 

For (b) we observe that we can find a clique of size (1 — o(l)) In m/(ln 1 /q 2 ) in 
polynomial time and we can colour G with (l + (j(l))(i/ In cf colours in polynomial 
time. 

We now prove part (c) of Theorem 3. We first observe 

Lemma 10. There exists a constant cq such that for e < eg there exist Rq = 
Ro{e),Qo = <5o(e) such that if Q > Qq,R> Rq and sg = 7?lnm then 

(a) whp every pair of disjoint sets S' 1 , 5'2 C [m], I 51 I = si > sg, 1521 = S 2 > sq 
contains at most (1 — e)siS 2 : S 2 edges of H ; 

(b) whp every S C [m], |5| = s > sg contains at most Qlnm members with 
degree greater than (1 — e)s in the subgraph of H induced by S. 

Proof 

(a) We can bound the probability that there are sets 5i, S 2 with more than the 
stated number of S\ : S 2 edges by 
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(l-e)siS2 

F2 



< 







0 ( 1 ). 



(b) We choose e > 0 so that p 2 < I — 3e. Given S', we consider a set A C S 
of size Qlnm. For R > Qe~^ we have \L\ < e|S| and so if each i G L has at 
least (1 — e)s neighbours in S then it has at least (1 — 2e)s neighbours in S — L. 
By the Chernoff bound, this occurs with probability at most for some 

C > 0 and this is less than for Q sufficiently high. Therefore, the expected 
number of S, L violating part (b) is at most 




< 



E 

S = So 






2^m- 



< ^ m * = o(l). 

S>So 



□ 

Now consider an assignment a for our CSP and let Ni be the set of variables 
that are assigned the value i by a. We observe that if a is consistent then each 
Ni is an independent set in G and so whp G is such that we must have 



m< 



3nlnd 

d 



< 



4n 

Klnm 



for i = 1, 2, . . . , m. 



( 10 ) 



Thus, we will restrict our attention to assignments which satisfy (10). We will 
prove that the expected number of such assignments that are consistent is o(l), 
thus proving part (c) of Theorem 3. 

We say that a pair of vertices is forbidden by a if that pair cannot form 
an edge of G without violating tr. Note that every pair in the same set Ni is 
forbidden, and a pair in Ni x Nj is forbidden iff ij is not an edge of H. We will 
show that the number of forbidden pairs is at least n^/lnlnm. It follows that 



Pr(CT is consistent) < (1 - pi)”'/ = o(m-”). 



assuming that d> iFlnmlnlnm for sufficiently large K. Since this probability 
is o(m~'^) we can multiply by m", which is an overcount of the number of 
assignments satisfying (10), and so obtain the desired first moment bound. 

Let Ui = |A^i| and let I = {i : Ui> n/(2m)}. Now 

n, = n-'^n,>n-m- — = (11) 

For the following analysis we choose constants: 

e, Q = max{Qo, lOOe"^}, Ki = 100i?o, K = lOOATiQ 
where e < eo, Qo, Ro from Lemma 10. 
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We partition I into 3 parts: 

— I\ = {i : n/(iTilnmlnlnm) < Ui < dn/iiTlnm} 

— I 2 = {i ■ n/ <ni < n/(iTilnmlnlnm)} 

“ h = {i'- nl{2m) <rn< n/(itTi In m)^} 

Case 1: rii > ^ Let Hi be the subgraph of H induced by Ii , and 

for each i G Ii, we let d{i) be the degree of i in Hi. Note that the total number 
of forbidden pairs of vertices for G is at least 



i d{i)rii X 
ieii 



n 

Ki In 771 In In m ’ 



( 12 ) 



since for all i' G Ii,nii > n/(itri In Tnlnln 777 ). 

By (10), we have |/i| > (iTln777)/24, so (Klnm)/Q < e|/i|. Thus, by Lemma 
10(b) then there are at most Q In 777 members i G Ii with d{i) < {K In m)/Q. 
Again using (10), these members contribute at most 4QnjK < n/12 to 
Therefore, the sum in (12) is at least 

1 ATln 777 77 77 77 ^ 

^ ^ 

2 Q 12 ATi In 777 In In 777 “ In In 777 



Case 2: — f Hj) = {i ^ I 2 ■ nj2^ < rh < ?7/2-^“^}, 

for 

log2(A'i ln?77lnln 777) < j < 2 log2(A'i In 777). We set tj = and Sj = 

|^(j)| > ij X (iTi In 777 In In 777/77). We set J = {j '■ tj > 77/(100 Inin 777)} and note 
that Sj > So (from Lemma 10) for each j G J . Note also that 



Y.^3> 

jeJ 



77 

6 



21og2(ATi In 777) X 



77 

100 Inin 777 




Consider I{j) for any j G J. By Lemma 10, there are at least e(^|) pairs 
i,i' € I{j) such that every pair of vertices in W x Ni> is forbidden. Also, for 
any i, every pair in W x W is forbidden. Since the sizes of the sets W, * € I{j) 
differ by at most a factor of 2, this implies that the number of forbidden pairs 
in Uig 7 (j)iVi is at least Now consider any pair /(/),/(/') with j,j' & J- 
By Lemma 10(a), there are at least eSjSj> pairs i G G I(j') such that 

every pair of vertices in Ni x is forbidden, and this implies that the number 
of forbidden pairs in x is at least jtjtji. Thus, the total 

number of forbidden pairs is at least 



e 

8 







yieJ 



> 



8^ In In 777 



Case 3: Here we follow essentially the same argument 

as in Case 2. Again, let J(j) = {i G I : 77/2-1 < rii < 77/2-1“^}, but this time we 
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consider 21og2(i^i Inm) < j < log2(2m). Again, tj = J2ieiU) = 1^0)1^ 

but note that this time we have 

tj 

~ n/{Ki In m)2 

Here, we set J = {j '■ tj > n/A'ilnm} and so again we have Sj > sq for every 

j e J- 

E n , , n n 

«,> J-log.(2m)x 

jeJ 

The same argument as in Case 2 now goes through to imply that the total 
number of forbidden pairs is at least 



e 

8 




n 



2 



In In 771 



□ 



6 Model B: Resolution Complexity 

First note that whp every set of 10 vertices in H has a common neighbour, 
since the probability of at least one such set not having a common neighbour is 
less than ~ o(l)- Assuming that H has this property, every vertex of 

degree at most 10 in G will be in the boundary. 

A straightforward first moment argument shows that a.s. every subgraph 
G' of G with at most vertices has at most 5|G'| edges. (We omit the 

standard calculation.) Therefore, every such G' has at least |G'|/11 vertices of 
degree at most 10. This implies both conditions of Lemma 7 with s = 
and C = l/(22d^/^) and thus implies Theorem 4. □ 

We remark that the exponent “3” of d in the statement of Theorem 4 can be 
replaced by values arbitrarily close to 2 by replacing “10” with a larger value in 
this proof. 
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Abstract. In this paper we study continuous-time quantum walks on 
Cayley graphs of the symmetric group, and prove various facts concern- 
ing such walks that demonstrate significant differences from their clas- 
sical analogues. In particular, we show that for several natural choices 
for generating sets, these quantum walks do not have uniform limiting 
distributions, and are effectively blind to large areas of the graphs due 
to destructive interference. 



1 Introduction 

According to our current understanding of physics, quantum mechanics provides 
sources of true randomness, and mathematically speaking much of the underly- 
ing framework of quantum information and computation may be viewed as an 
extension of the study of random processes. The focus in quantum information 
and computation is often placed on finding information processing tasks that can 
be performed with the help of quantum information (such as factoring integers 
in polynomial time [20] or implementing unconditionally secure key distribu- 
tion [7,21]) or on studying the distinctively non-classical aspects of quantum 
information (such as entanglement; see, for instance, [12]). However, it seems 
quite plausible that the study of quantum information and computation will 
also lead to new methods in the study of classical computation and random 
processes. Along these lines, Kerenidis and de Wolf [17] recently used quantum 
arguments to prove new results on (classical) locally decodable codes. 

As a step toward understanding the possible implications of quantum meth- 
ods for the study of random processes, it is natural to consider the differences 
between classical and quantum processes. One of the topics that has recently 
received attention in the quantum computing community that highlights these 
differences is the the study of quantum computational variants of random walks, 
or quantum walks [1,3,5,6,8,9,11,15,18,19,23]. (A recent survey on quantum walks 
by Kempe [16] is an ideal starting point for background on quantum walks.) 
In this paper we consider quantum walks on Cayley graphs of the symmetric 
group — a topic that has been suggested in at least two previous papers on quan- 
tum walks [16,3]. 

S. Arora et al. (Eds.): APPROX 2003-1- RANDOM 2003, LNCS 2764, pp. 290-301, 2003. 
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Two main variants of quantum walks have been considered: continuous-time 
quantum walks and discrete-time quantum walks. We restrict our attention to 
continuous-time quantum walks in this paper. Keeping in line with previous re- 
sults on quantum walks, we find some significant differences between quantum 
and classical random walks on Cayley graphs of the symmetric group. In partic- 
ular, we find that quantum walks on Cayley graphs of the symmetric group do 
not have uniform limiting distributions for several natural choices for the gener- 
ators. This answers a question recently suggested by Ahmadi, Belk, Tamon, and 
Wendler [3] concerning non-uniform mixing of quantum walks. 

One of the principle motivations for studying quantum walks has been that 
quantum walks may potentially be useful as algorithmic tools. This potential 
was recently demonstrated by Childs, Cleve, Deotto, Farhi, Gutmann and Spiel- 
man [8], who prove that there exists a black-box problem for which a quantum 
algorithm based on quantum walks gives an exponential speed-up over any clas- 
sical randomized algorithm. The key to this algorithm is that a quantum walk 
is able to permeate a particular graph while any classical random walk (or any 
classical randomized algorithm, for that matter) cannot. One of the first prob- 
lems that comes to mind as an obvious challenge for the quantum algorithms 
community is the graph isomorphism problem, and it is natural to ask whether 
quantum walks, and in particular quantum walks on Cayley graphs of the sym- 
metric group, can be of any use for an algorithm for this problem. (While this was 
our primary motivation for studying quantum walks on the symmetric group, 
we have not found any way to apply our results to this problem.) 

2 Definitions 

2.1 Continuous-Time Quantum Walks on Graphs 

A continuous-time quantum walk on an undirected graph F = (K, E) can be de- 
fined in the following way. First, we let A be the \V\ x \V\ adjacency matrix of F, 
let D be the |K| x |K| diagonal matrix for which the diagonal entry corresponding 
to vertex v is deg(u), and let L = D — A. The matrix L is positive semidefinite 
and, under the assumption that F is connected, 0 is an eigenvalue with multiplic- 
ity 1; the uniform vector is a corresponding eigenvector. The quantum walk on F 
is then given by the unitary matrix U {t) = for t e K. If the quantum walk 

on F is run for time t starting at vertex u, then the amplitude associated with 
each vertex v is t/(t)[u, u], and thus measuring at this point (with respect to the 
standard basis) results in each vertex v with probability \U{t)[v,u]\^ . If instead 
of starting at a particular vertex u we have some quantum state described by 
Ip : V —>■ C, and we run the quantum walk for time t, the new quantum state 
is described by U{t)'ip, and measuring results in each vertex v with probability 
|(f/(t)^)[r!]| . Other types of measurements can be considered, but we will focus 
just on this sort of measurement where the outcome is a vertex of the graph. To 
our knowledge, continuous-time quantum walks were first considered by Farhi 
and Gutmann [11]. 
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Continuous-time quantum walks are analogous to continuous-time random 
walks on F, where the evolution is described by M{t) = rather than U{t) 
as above. Specifically, if the continuous-time random walk is started at vertex u 
and run for time t, the probability of being at vertex v is given by M{t)[v,u\. 
Continuous-time random walks share many properties with their discrete-time 
variants [4]. 

This paper is concerned with quantum walks on Cayley graphs, which are 
regular graphs. In the case of regular graphs there is no difference between using 
the matrix L and the adjacency matrix for the definition of quantum walks, 
and we find it is more convenient to use the adjacency matrix for the graphs 
we are considering. (Of course one cannot replace L with the adjacency matrix 
when discussing the classical case, since this would not give rise to a stochastic 
process — the equivalence only holds for the quantum case.) The reasoning behind 
this equivalence is as follows. Because D and A commute for regular graphs, we 
see that U{t) = difference is a global phase factor, 

which has no significance when calculating the probabilities. So, from here after 
in this paper we will consider the unitary process given by U{t) = rather 
than 

In the case of classical random walks, there are various properties of random 
walks that are of interest. One of the most basic properties of a classical random 
walk is the limiting distribution (or stationary distribution) . This distribution is 
the uniform distribution for random walks on connected, regular graphs, and in 
fact as a result of the way we have defined continuous-time random walks this 
distribution is uniform for any connected, undirected graph; this is apparent by 
considering the spectral decomposition of the matrix e~*^. 

As quantum walks are unitary (and therefore invertible) processes, they do 
not converge to any state, so one must be precise about what is meant by the 
limiting distribution. Suppose we have a quantum walk on some graph F and 
some vertex u has been designated as the starting vertex. The probability of 
measuring the walk at some vertex v after time t is, as described above, given 
by Pt[v] = \U{t)[v,u]\^ . If t is chosen uniformly from some range [0,T] then the 
resulting distribution is 



In the limit for large T these distributions converge to some distribution P, which 
is the limiting distribution of the quantum walk. This notion of the limiting 
distribution for a quantum walk is discussed in [1]. 

2.2 Cayley Graphs and Representation Theory of the Symmetric 



In this section we briefly discuss necessary background information on Cayley 
graphs of the symmetric group and on representation theory of the symmetric 
group, which is the main tool used in this paper to analyze quantum walks on 
Cayley graphs. 




Group 
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Let G be a finite group and let i? C G be a set of generators for G sat- 
isfying g G R ^ g~^ G R for all g G G. Then the Cayley graph of G with 
respect to R, which we denote by T(G, R) in this paper, is an undirected graph 
defined as follows. The set of vertices of T(G, R) coincides with G, and for any 
g,h G G, {g, h} is an edge in T(G, R) if and only if gh~^ G R. Equivalently, if 
R = {hi , . . . , hd} then each vertex g is adjacent to vertices h\g , . . . , hag. Thus, 
r{G,R) is a regular graph of degree d= |i?|. We will restrict our attention to 
generating sets that form conjugacy classes. (The method we use for analyzing 
quantum walks on Cayley graphs is limited to such generating sets.) Recall that 
for some group G, elements g and h are conjugate if there exists some a G G such 
that a~^ga = h. This is an equivalence relation that partitions G into conjugacy 
classes. A function / : G — > C is a class function if it is constant on conjugacy 
classes of G. 

The conjugacy classes in 5'„ are determined by the cycle structures of ele- 
ments when they are expressed in the usual cycle notation. Recall that a partition 
A of n is a sequence (Ai, . . . , A*,) where Ai > • • • > A^ > 1 and Ai H — • -I- A^ = n. 
The notation A h n indicates that A is a partition of n. There is one conjugacy 
class for each partition A h n in S'™, which consists of those permutations hav- 
ing cycle structure described by A. We denote by C\ the conjugacy class of Sn 
consisting of all permutations having cycle structure described by A. 

A representation of a group G is a homomorphism from G to GL(d, C) for 
some positive integer d, where GL(d, C) denotes the general linear group of in- 
vertible d X d complex matrices. The dimension of such a representation is d, and 
we write dim(p) to denote the dimension of a given representation p. Two rep- 
resentations Pi : G — *■ GL(di,C) and p 2 ■ G ^ GL(d 2 ,C) are equivalent if there 
exists an invertible linear mapping A : such that Api{g) = p 2 {g)A 

for all g G G, otherwise they are inequivalent. A representation p of dimension d 
is irreducible if there are no non-trivial invariant subspaces of under p. That 
is, if W C is a subspace of such that p{g)W C W for all g G G, then 
W = or W = {0}. A collection of inequivalent, irreducible representations 
is said to be complete if every irreducible representation is equivalent to one 
of the representations in this set. It holds that any complete set of irreducible 
representations can be put into one-to-one correspondence with the conjugacy 
classes of the group in question. 

The character corresponding to a representation p is a mapping yp : G — > C 
obtained by taking the trace of the representation: Xpid) = tr(p(p)). Using the 
cyclic property of the trace it follows that the characters are constant on the 
conjugacy classes of a group. If we have a complete set of inequivalent, irreducible 
representations of a group, then the corresponding characters form an orthogonal 
basis for the space of all class functions. 

The Fourier transform / of a complex- valued function / on G at a represen- 
tation p is /(p) = J2gea f id) pig)- 

Fact 1 Let f be a class function on a group G and p be an irreducible represen- 
tation ofG, then f{p) = (EsgG /( d)Xp(ff)) I- 
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For the symmetric group on n elements there is a particular way of asso- 
ciating the partitions of n (which are in one-to-one correspondence with the 
conjugacy classes of Sn) with a complete set of inequivalent, irreducible repre- 
sentations of Sn- These particular representations are said to be in Young normal 
form. (Several text books describe the specific method for constructing these 
representations — see, for instance, James and Kerber [14]. It will not be impor- 
tant for this paper to discuss the actual construction of these representations.) 
These representations have the special property that all matrix entries in these 
representations are integers. Once we have these irreducible representations, it is 
possible to associate with each one an equivalent irreducible representation that 
has the property that p{g) is a unitary matrix for every g £ Sn- The irreducible, 
unitary representation associated with a given partition A h n will be denoted 
p\, and the corresponding character will be denoted The following fact will 
be a useful fact regarding these representations. 

Fact 2 Let A and p, he partitions ofn and let p\ and he the associated unitary 
representations as described above- Then for all 1 < i, j < dim(pA) o,nd 1 < k,l < 
dim(p^), 



P\{g)[hj]Pui9)[k,l] 

geS„ 



dimU) if \ = P, i = k, and j = I 
0 otherwise 



When X,v \- n, we write to denote the character xa evaluated at an 

arbitrary g £ C^, and more generally if / is a class function we write f(i') to 
mean f{g) for any g £ C^- 

Fact 3 The sum of the squares of the characters of a conjugacy class over any 
complete, irreducible set of representations of a group G multiplied by the order 
of the class is the order of G- Thus, we have \C\\J2u\-nXi^W^ = /o?' every 

A h n. 



It will be necessary for us to be able to evaluate the characters associated 
with the irreducible representations of the symmetric group in certain instances. 
The Murnaghan-Nakayama rule provides a tool for doing this — information on 
the Murnaghan-Nakayama rule can be found in [22]. 



3 Continuous-Time Quantum Walks on r{S-n, C\) 

In this section we analyze the quantum walk on T'(S'„, G\) for A h n. Our anal- 
ysis implies that for some natural choices for A the quantum walk on T(S'„, C\) 
does not have a uniform limiting distribution with respect to the definition dis- 
cussed in the previous section. In essence, the quantum walk has a significant 
“blind spot” consisting of all n-cycles (i.e., permutations having cycle-structure 
consisting of a single n-cycle). 

This section is divided into three subsections. First we prove a general result 
concerning the spectral decomposition of quantum walks on Sn- We then consider 
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the case where the generating set consists of the set of all transposition, and 
finally the case where the generating set consists of all p-cycles for any choice of 
p€ {2,...,n}. 

3.1 Spectral Decomposition and Periodicity 

Define ca : S'™ — *■ C to be the unit vector that is uniform on the conjugacy class 
C\ and zero everywhere else: c\[g] = if 5 C Ca, and c\[g] = 0 otherwise. 

The analysis of quantum walks on r{Sn, C\) is greatly simplified by the fact 
that these walks are constant on conjugacy classes, in the following sense. 

Proposition 4 Let at{g) denote the amplitude assoeiated with vertex g after 
evolving the quantum walk on r(Sn, C\) for time t, assuming the walk starts on 
a eonjugaey class, i.e., at{g) = {U{t)c\) [g]. Then for all t, at is a class function. 

The following theorem will be one of the main tools used in our analysis. 

Theorem 5. Assume H[g,h] = f{g~^h) for all g,h G Sn, where f a class 
function on Sn, and let U{t) = for all t G M.. Then for any partitions 

A, /i h n we have 



In order to prove this theorem we will use the following lemma, by which a 
complete orthogonal set of eigenvectors and eigenvalues of U (t) can be obtained. 

Lemma 1. Assume H[g,h] = f{g~^h) for all g,h G Sn, where f is a class 
function on Sn- Define vectors ■ Sn —>■ C for each v n, 1 < i, j < 

dim(pi,) by ipv.ijlg] = Piy{g)[hj] for all g G Sn- Then each is an eigenvector 

of H with associated eigenvalue Moreover, these 

eigenvectors are pairwise orthogonal and span the space O'®" . 

Remark. The fact described in Lemma 1 is not new — for instance, it is discussed 
in Section 3E of [10] for general finite groups. A short proof of the lemma follows. 

Proof of Lemma 1. For each 5 e S'„ we have 



Now, since Pi, is a homomorphism, we have p,^{gh) = p,j{g)pn(h), which implies 




{Hil)n,i,j)[g] = X! f(9 ^h)Pi^(h)[i,j] = fWP<'(9h)[i,j]- 



h&G 



h&G 




= P’'i9)[i,k]f{pn)[k,j]- 
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By Fact 1 we see that 

= .. I X fWxAh)pi^{g)[iJ] 

This establishes that the vectors ipu,i,j are eigenvectors with associated eigenval- 
ues as claimed. The fact that these eigenvectors are pairwise orthogonal follows 
from Fact 2 and the fact that they span the entire space C'^'* follows from this 
orthogonality along with Fact 3. ■ 

Proof of Theorem 5. By Lemma 1 we may write 



^=E 



ydim(p^) 






llv’l'.j.fep 



and therefore 



U{t) = Y^ exp 



it 



dim(pi,) 



E \^-f\f(^)xAi) 



'^^j,kVu,j,k 

\\i’i^,],kV 



Let X\ : Sn ^ C denote the characteristic function of C\ for A h n. Then we 
have that 



m^if7 = fc 



C\4’i^,J,k = —7==X\{p^)[j,k] = < dim(p„) ^ 



V\c. 

by Fact 1. By Fact 2 we have \\ip^,j,k\\'^ = So, 

C\U{t)c^ 



otherwise 



iff \ dim(p,,) 

= E dWo 1 E \‘^Mh)xAl) dim(p,) ^ 

V j J.fc=l 

= ^exp f ^/(^^) El^-rl/(7)x.(7)) X.WxAp), 



ip\-n 



'yhn 



which is what we wanted to show. 



Theorem 5 implies the following interesting fact. 

Proposition 6 Any continuous-time quantum walk on the Cayley graph of the 
symmetric group for which the generators form conjugaey classes is periodic, 
with period 27r/fc for some k e {1, 2, 3, . . .}. 
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Proof. Using Fact 1 we see that the quantity \C^\xv{l) / is a sum of 

matrix elements of irreducible representations. This quantity is independent of 
the particular choice of the basis for the irreducible representations, so we may 
choose that basis that corresponds to Young’s natural form, in which all of the 
matrix entries are integer valued, implying that the quantity itself is integer 
valued. Using Fact 3 and Theorem 5 therefore have that U{2tt) = C/(0) = /. 
Thus the period of the walk must divide 2tt. ■ 

We have not discussed mixing times in this paper, but the previous propo- 
sition implies that quantum walks on Cayley graphs of S'„ reach their limiting 
distribution quickly, and when calculating the limiting distribution it is only 
necessary to average over times in the range [0,27 t]. Note that in terms of im- 
plementation, this does not mean that the walk mixes in constant time; some 
number of operations that is polynomial in the degree of the graph and in some 
accuracy parameter is required to implement such a walk, assuming the ability 
to compute the neighbors of each vertex. See [2,8] for further details. 



3.2 Cayley Graphs of Sn Generated by Transpositions 



For the Cayley graph of generated by the transpositions. Theorem 5 has 
various implications that we discuss in this section. We will require explicit 
values for various characters of the symmetric group, which we now mention. 
Using the Murnaghan-Nakayama rule it can be shown that 






(-1)” *’for (/c, fee {!,..., n} 

0 otherwise 



and X(k,i,---P) = dim(p(^ ^ For the characters at the transpo- 

sitions, it is known [13] that 



Xu{t) 



dim(p^) ^ ( ( Vj\ ( v', 

Q) u 



Here, r is any transposition, v' is the partition generated by transposing the 
Young diagram of v, while Vj and i/ are the components of the partitions v 
and u' . Substituting these values into Theorem 5 gives 



*TTi4.\ \/|C'a|\/|C'^| \ - ( 

C\U {t)c^ = 2^ exp I it 2^ 

i/hn y j 

for the quantum walk on r{Sn,C(2,i, - P))i 
/X = (1, . . . , 1) and A = (n) it follows that 




and specifically for the case where 



(2isin(tn/2))"'“^ 
V n ■ nl 




(- 1 ) 



i—k 



n — 1 
fc-i 
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In particular, 



max 

t 






2 



2^2n—2 

n ■ n! ’ 



( 1 ) 



where the maximum occurs for t = (2k + l)Trjn, fc £ Z. 

Eq. 1 has the following interpretation. If we start a quantum walk on 
r(Sn, C'(2,i,. ..,!)) the identity element and evolve for any amount of time and 

o2n — 2 

measure, the probability to measure some n-cycle is at most ^ as opposed 
to probability approaching - for the classical case. The probability to measure 
any particular n-cycle is therefore at most as opposed to some number 

approaching ^ classically. The probabilities in the quantum case are smaller by 
a factor that is exponential in n. 

As discussed in Section 2.1, we will denote by Pt the distribution on Sn 
obtained by performing the quantum walk on P(Sn-, C'(2,i,...,i)) for time t starting 
at the identity then measuring. The above analysis gives a lower bound for the 
total variation distance of Pt from the uniform distribution: 



^ 22^—2 

II P* — uniform|| > - 

n n ■ n\ 



for all values of t. This bound follows from considering only the n-cycles, and we 
believe the true bound to be much larger. Numerical simulations support this 
claim, but thus far we only have exact expressions for the n-cycles. 

Given that we have an exact expression for the probability Pt[g\ for any n- 
cycle (/, it is easy to determine the probability associated with any n-cycle in the 
limiting distribution. By the periodicity of our walks, we have 

/2n-2N 

for each g G Sn-, and thus for any g G C(n) we have P[g] = |^V ■ Somewhat 

surprisingly, this average probability associated with reaching a given n-cycle is 
not unique to the particular choice of as a generating set, as shown in 

the next subsection. 



3.3 Other Generating Sets 

We have not been able to obtain tractable expressions for the amplitudes associ- 
ated with quantum walks for other generating sets besides However, 

we can prove some facts concerning the limiting distributions for such walks in 
the case that the generating set consists of all p-cycles for any choice of p. (In 
case p is odd, we must keep in mind that only the alternating group is being 
generated.) Again we will focus on the probability of reaching n-cycles starting 
from the identity. 

Consider the quantum walk on P(Sn,Cry), where 7 is any partition. According 
to Theorem 5, the probability associated with a given conjugacy class C\ when 
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starting from a uniform superposition on another class after time t is given 
by |c^l7(f)c^p, which may be written as 



\Cx\\C^\ 

(n !)2 



y~^exp I it \C. 



iy,r) 



f 

Vdim(p^) 



Xr,(7) A A 
dim(p^)yy 






As before, we let P denote the limiting distribution of the walk when starting 
from the identity. Since our walks are periodic with period 27 t, we therefore have 

1 * 

Xi^ig) dim{p,,)xv{9) dim(p,,). 

Here the asterisk denotes that the sum is over all partitions v, rj subject to the 
condition 

X^(t) ^ Xr,h) ^2) 

dim(py) dim(p,j) ' 

Observe that the choice of generators only affects the average distribution by 
determining what values other than v = rj are included in the sum. More gener- 
ally, the average probability associated with obtaining some element in C\ when 
starting the walk on the uniform superposition over is given by 

Xu{>d)Xu{g)Xr,{>d)Xv{g)- 

' ' jj r > 



In the case that g is an n-cycle and 7 = (p, the condition of 
Eq. 2 is relatively easy to characterize for those partitions v and 77 for which 
Xi^ig)Xvi9) ^ d. Figure 1 summarizes the probability associated with each n- 
cycle g in the limiting distribution for the quantum walk on r{Sn, C-y)- Due to 
space constraints, the derivation of these probabilities has been omitted. (See 
http://arxiv.org/abs/quant-ph/0305182 for a longer version of this paper con- 
taining these details.) 

We have the following lower bounds on the total variation distance of the lim- 
iting distribution from the uniform distribution. As for the case of the quantum 
walk generated by the transpositions, this bound follows just from considering 
the n-cycles, and we believe the true distance from uniform to be much larger. 

• Let p G {2, . . . , n} be even, let 7 = (p, 1, . . . , 1) h n and let P denote the 
limiting distribution of the quantum walk on T'(5'„, Cj). Then 

II P - uniform(5'„)|| > . 

n n ■ n\ \n — 1 J 



• Let n be odd, let p € {2, . . . , n} be odd, let 7 = (p, 1, . . . , 1) h n and let P 
denote the limiting distribution of the quantum walk on P{Sn, C-y). Then 



||P - uniform(A„)|| > — 

n n ‘ n\ \ n — I / 



1 

n • n\ 




2 
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Parity of n 


Parity of p 


Range of p 


Probability at each n-cycle 


even or odd 


even 


2<P< rti 


1 /2n-2\ 

V n—1 ) 


even 


even 


f -bl<p<n-l 


2 — P /n — 

fTTF)^ 2^k=i \k-i) 


odd 


even 


ti±l-\-l<p<n-l 


2 p /n — 1\2 . 4 /n — 2\2 

2^k = l \k-l) ■(nO^Vp-l/^ 


even 


even 


p = n 


1 / 2 n — 2 \ 

(TTIp'v n-l ) 


even 


odd 


— 


0 


odd 


odd 


2 < P < ^ 


2 /2n — 2\ 1 /n— 1\2 

(n!)^ V n — 1 / (tt,!)^ V / 


odd 


odd 




4 p — 1 4 /n — 2\2 

(Trrp' 2 ^fc=i U- J ■(nip'lp-ij 


odd 


odd 


p = n 


_2 / 2 n- 2 \ _ 1 

(n!)^ V n — 1 / (n!)^ V / 



Fig. 1. Probabilities associated with each n-cycle in the limiting distribution for 

r{Sn, 

4 Conclusion 

In this paper we have studied some of the properties of continuous-time quantum 
walks on Cayley graphs of the symmetric group. Many questions concerning these 
walks remain unanswered. One obvious question that we have not attempted to 
address in this paper is whether quantum walks on the symmetric group can 
be applied in the context of quantum algorithms. In terms of specific properties 
of these walks, we have focused on the limiting distribution — is the limiting 
distribution bounded away from uniform by a constant? Many other properties 
of these walks may be of interest as well. For instance, the effect of decoherence 
on these walks is an interesting topic to consider. 
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Abstract. We consider the problem of distribution-free property test- 
ing of functions. In this setting of property testing, the distance between 
functions is measured with respect to a fixed but unknown distribution 
D on the domain, and the testing algorithms have an oracle access to 
random sampling from the domain according to this distribution D. 
This notion of distribution- free testing was previously defined, but no 
distribution-free property testing algorithm was known for any (non- 
trivial) property. By extending known results (from “standard” , uniform 
distribution property testing), we present the first such distribution- free 
algorithms for two of the central problems in this field: 

— A distribution-free testing algorithm for low-degree multivariate poly- 
nomials with query complexity 0{d^ + d ■ e~^), where d is the total 
degree of the polynomial. 

— A distribution-free monotonicity testing algorithm for functions / : 
[rdf — > A for low-dimensions (e.g., when d is a constant) with query 
complexity 0(i2fil^). 

The same approach that is taken for the distribution-free testing of low- 
degree polynomials is shown to apply also to several other problems. 



1 Introduction 

The classical notion of decision problems requires an algorithm to distinguish ob- 
jects having some property V from those objects which do not have the property. 
Property testing is a recently-introduced relaxation of decision problems, where 
algorithms are only required to distinguish objects having the property V from 
those which are at least “e-far” from every such object. The notion of property 
testing was introduced by Rubinfeld and Sudan [35] and since then attracted 
a considerable amount of attention. Property testing algorithms (or property 
testers) were introduced for problems in graph theory (e.g. [2, 23, 24, 30]), 
monotonicity testing (e.g. [9, 13, 14, 18, 19, 22]) and other properties (e.g. 
[1, 3, 4, 5, 7, 10, 12, 15, 17, 20, 27, 28, 29, 31, 32, 34]; the reader is referred 
to excellent surveys by Ron [33], Goldreich [21], and Fischer [16] for a presenta- 
tion of some of this work, including some connections between property testing 
and other topics). The main goal of property testers is to avoid “reading” the 
whole object (which requires complexity at least linear in the size of its repre- 
sentation); i.e., to make the decision by reading a small (possibly, selected at 

S. Arora et al. (Eds.): APPROX 2003-1- RANDOM 2003, LNCS 2764, pp. 302-317, 2003. 
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random) fraction of the input (e.g., a fraction of size polynomial in 1/e and 
poly-logarithmic in the size of the representation) and still having a good (say, 
at least 2/3) probability of success. 

A crucial component in the definition of property testing is that of the dis- 
tance between two objects. For the purpose of this definition, it is common to 
think of objects as being functions over some domain X . For example, a graph 
G may be thought of as a function fa ■ V x V —!■ {0,1} indicating for each 
edge e whether it exists in the graph. The distance between functions / and g is 
then measured by considering the set Xf^g of all points x where f(x) yf g(x) and 
comparing the size of this set Xf^g to that of X\ equivalently, one may introduce 
a uniform distribution over X and measure the probability of picking x € Xf^g. 
Note that property testers access the input function (object) via membership 
queries (i.e., the algorithm gives a value x and gets f{x)). 

It is natural to generalize the above definition of distance between two func- 
tions, to deal with arbitrary probability distributions D over X , by measuring the 
probability of Xf^g according to D. Ideally, one would hope to get distribution- 
free property testers. A distribution-free tester for a given property V accesses 
the function using membership queries, as above, and by randomly sampling 
the fixed but unknown distribution D (this mimics similar definitions from 
learning theory and is implemented via an oracle access to D] see, e.g., [26] ^). 
As before, the tester is required to accept the given function / with probability 
at least | if / satisfies the property V, and to reject it with probability at least 
I if / is at least e-far from V with respect to the distribution D. 

Indeed, these definitions of distance with respect to an arbitrary distribu- 
tion D and of distribution-free testing were already considered in the context of 
property testing [23]. However, to the best of our knowledge, no distribution-free 
property tester was known for any (non-trivial) property (besides testing algo- 
rithms that follow from the existence of proper learning algorithms in learning- 
theory [23]). Moreover, discouraging impossibility results, due to [23], show that 
for many graph-theoretic properties (for which testers that work with respect to 
the uniform distribution are known) no such (efficient) distribution-free tester 
exists. As a result, most previous work focused on testers for the uniform distri- 
bution; some of these algorithms can be generalized to deal with certain (quite 
limited) classes of distributions (e.g., product distributions [23]), and very few 
can be modified to be testers with respect to any known distribution (as was 
observed by [16] regarding the tester presented in [28]), but none is shown to be 
a distribution-free tester. Let us review some of the central problems, studied in 
the context of property testing, which are relevant to the current work. 

Low-degree tests for polynomials. The first problem studied in the field of prop- 
erty testing was that of low-degree testing for multivariate polynomials over a 

® More precisely, distribution-free property testing is the analogue of the PAC-I-MQ 
model of learning (that was studied by the learning-theory community mainly via 
the EQ+MQ model); standard property testing is the analogue of the uniform-|-MQ 
model. 
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finite field, where one wishes to test whether a given function can be represented 
by a multivariate polynomial of total degree d, or is it e-far from any such poly- 
nomial. Later, the problem of low-degree testing played a central role in the 
development of probabilistic checkable proofs (PCP), where the goal is to prob- 
abilistically verify the validity of a given proof. For the problem of low-degree 
testing, Rubinfeld and Sudan [35] presented a tester with query complexity of 
0{d^ + d - e~^). This test was further analyzed in [8]. The reader is also referred 
to [10], where a linearity test (which tests whether a given function acts as an 
homomorphism between groups) is presented, and to [3, 6, 7, 20] for other related 
work. 

Monotonicity testing. Monotonicity has also been a subject of a significant 
amount of work in the property testing literature (e.g. [9, 13, 14, 15, 18, 19, 22]). 
In monotonicity testing, the domain X is usually the d-dimensional cube [n]"^. A 
partial order is defined on this domain in the natural way (for y, z G [n]'^, we say 
that y < 2 : if each coordinate of y is bounded by the corresponding coordinate 
of z).'^ A function / over the domain [n]‘^ is monotone if whenever z > y then 
f(^) > f{y)- Testers were developed to deal with both the low-dimensional and 
the high-dimensional cases (with respect to the uniform distribution over the 
domain) . In what follows, we survey some of the known results on this problem. 
In the low-dimensional case, d is considered to be small compared to n (and, in 
fact, it is typically a constant); a successful algorithm for this case is typically 
one that is polynomial in 1/e and in log n. The first paper to deal with this 
case is by Ergiin et al. [14] which presented an 0(i^^|^) algorithm for the line 
(i.e., the case d = 1), and showed that this query complexity cannot be achieved 
without using membership queries. This algorithm was generalized for any fixed 
d in [9]. For the case d = 1, there is a lower bound showing that testing mono- 
tonicity (for some constant e) indeed requires 12(logn) queries [15]. In the high 
dimensional case, d is considered as the main parameter (and n might be as low 
as 2); a successful algorithm is typically one that is polynomial in 1/e and d. 
This case was first considered by Goldreich et al. [22] that showed an algorithm 
for testing monotonicity of functions over the boolean (n = 2) d-dimensional 

hyper-cube to a boolean range using O(^) queries. This result was generalized 

,1 2 

in [13] to arbitrary values of n, showing that 0{ ' ” ) queries suffice for testing 

monotonicity of general functions over [n]‘^, which is the best known result so 
far. 

1.1 Our Contributions 

Our contributions are distribution-free testers for the two properties mentioned 
above: low-degree multivariate polynomials and low-dimensional monotone func- 
tions. We observe that the approach that stands behind the low-degree test can 
also be applied to the testing of other properties such as dictatorship and juntas 
functions [17, 32]. These algorithms are the first known distribution- free testers 

In the case d = 1 this yields a linear order. 
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for non-trivial properties. By this, we answer a natural question that has already 
been raised explicitly by Fischer [16, Subsection 9.3] and is implicit in [23]. We 
emphasize that our algorithms work for any distribution D without having any 
information about D. 

Distribution-free low-degree testing for polynomials ( and more). We show how to 
generalize the tester presented in [35] to a distribution-free tester with the same 
(up to a multiplicative constant factor of 2) query complexity (0(d^ -\- d- e“^)). 
The algorithm and its analysis are presented in Section 3. 

The generalization of the uniform tester to a distribution- free one is done, 
in this case, by adding another stage to the uniform tester. In this new stage, 
after verifying that the input function / is close to some low-degree polynomial 
g with respect to the uniform distribution, we check that / is also close to 
this specific polynomial g with respect to the given distribution D. For this 
purpose, our approach requires that we will be able to calculate the values of 
g efficiently based on the values on /. This is a generalization of the notion of 
self-correctors for single functions (see [10]) to classes of functions (which was 
previously introduced in [35]). We observe that the same approach can be used 
for distribution-free testing of every property that is testable in the uniform 
distribution and has a self-corrector in the above sense. The full details of this 
generalization appear in Section 4. 

Distribution-free monotonicity testing. We present a distribution-free mono- 
tonicity tester in the low-dimensional hyper-cube case. Specifically, we present 
an algorithm whose complexity is 0( *°^ ) queries. This is done by first con- 

sidering the one-dimensional case (the “line”). In this case, we prove that an 
algorithm of [14] can be slightly modified to deal with the distribution-free case 
with the same query complexity of 0(^^^^). Though it is possible to modify 
the original analysis for the distribution- free case, we choose to present a whole 
different analysis. We then show how to appropriately generalize this algorithm 
to deal with higher (yet, low) dimensions (a similar generalization approach was 
used in [9] for the uniform distribution case). The tester for the one-dimensional 
case and its generalization for higher dimensions appear in Section 5. Finally, 
we remark that it can be shown that distribution-free testing of monotonicity in 
the high-dimensional case cannot be done efficiently [11]. 

It is typical for known property testers to be quite simple and the analysis of 
why these algorithms work is where the property V in question requires under- 
standing; indeed, Goldreich and Trevisan [25] proved that in certain settings this 
is an inherent phenomena: they essentially showed (with respect to the uniform 
distribution) that any graph-theoretic property that can be tested can also be 
tested (with a small penalty in the complexity) by a “generic” algorithm that 
samples a random subgraph and decides whether it has some property. Our work 
is no different in this aspect: our algorithms are similar to previously known al- 
gorithms and the main contribution is their analysis; in particular, that for the 
distribution-free case. Moreover, it is somewhat surprising that our distribution- 
free testers require no dramatically-different techniques than those used in the 
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construction and the analysis of previous algorithms (that work for the uniform 
distribution case). We remark, however, that although all the distribution-free 
testers presented in this work can be viewed as variations of testers for the uni- 
form distribution, the modifications of the uniform-distribution testers in the 
various problems are different.® 

2 Definitions 

In this section, we formally define the notion of being e-far from a property V 
with respect to a given distribution D defined over ft, and of distribution-free 
testing. Assume that the range of the functions in question is A. 

Definition 1. Let D and X he as above. The Z?-distance between functions 
f,g \ X ^ A is defined by distD{f,g) = Prx~r>{/(a;) yf g{x)}. 

The D-distance of a function f from a property V (i.e., the class of functions 
satisfying the property V) is distoif ,V) = min^gp distjj{f,g). 

We say that f is (e, Z?)-far from a property V if distoif,!-’) > e. 

When the distribution in question is the uniform distribution over ft, we 
either use U instead of D or (if clear from the context) we omit any reference to 
the distribution. 

Next, we define the notion of distribution-free tester for a given property V . 

Definition 2. A distribution-free tester for a property V is a probabilistic oracle 
machine M , which is given a distance parameter e > 0, and an oracle access 
to an arbitrary function f : X A and to sampling of a fixed but unknown 
distribution D over ft, and satisfies the following two conditions: 

1. If f satisfies V, then Pr{M-^’^ = Accept} = 1. 

2. If f is (e, D)-far from V, then Pr{M-^’^ = Accept} < i. 

We note that a more general definition of testers that allows two-sided errors 
(as discussed in the introduction) is not needed here; all our testers, like many 
previously known testers, have one-sided error and always accept any function 
that satisfies the property V in question. 

The definition of a uniform distribution tester for a property V can be derived 
from the previous definition by omitting the sampling oracle (since the tester 
can sample in the uniform distribution by itself) and by measuring the distance 
with respect to the uniform distribution. 

Notice that since the distribution D in question is arbitrary, it is possible that 
there are two different functions / and g such that distjj^f^g) = 0. Specifically, 
it is possible that f ^ V and g G V. Since the notion of testing is meant 
to be a relaxation of the notion of decision problems, it is required that the 
algorithm accepts (with high probability) functions that satisfy V, but may reject 
functions that have distance 0 from V (but do not satisfy V). This definition 

® Indeed, in light of [23] , there can be no generic transformation of uniform-distribution 
testers into distribution-free ones. 
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of distribution-free testing was introduced in [23, Definition 2.1]. In addition, 
note that the algorithm is allowed to query the value of the input function also 
in points with probability 0 (which is also the case with membership queries in 
learning theory)®. 

3 Distribution-Free Low-Degree Testers for Polynomials 

The first problem studied in the field of property testing was that of testing of 
multivariate polynomials (see [3, 6, 7, 10, 20, 35]). Let F be a finite field. In 
the problem of low-degree testing, with respect to the uniform distribution, the 
tester is given access to a function / : F™ — > F, a distance parameter e, and 
a degree d, and has to decide whether / is a multivariate polynomial of total 
degree d, or is at least e-far (with respect to the uniform distribution) from any 
degree d multivariate polynomial (i.e., one has to change the values of at least 
ex jFj™ points in order to transform / into a degree d multivariate polynomial; 
this implies that, for every degree d multivariate polynomial g, the probability 
that a uniformly drawn point x has a value g{x) different than f{x), is at least 
e). Rubinfeld and Sudan ([35]) presented a tester for this problem with query 
complexity 0{d^ + d ■ e~^). We show how to modify this tester to a distribution- 
free tester with the same query complexity (up to a constant factor of 2). 

3.1 Preliminaries 

Fix some value for d and assume from now on that ]Fj > lOd. To describe the 
testers (both the one for the uniform distribution and our distribution- free one) , 
we use the following terminology, from [35]: 

A line in F™ is a set of lOd -I- 1 points of the form {x, x + h, . . . ,x + lOdh} 
for some x,h G F™. The line defined by x and h is denote £x,h- 

We say that a line ix,h is an f -polynomial, if there exists a univariate poly- 
nomial Px,h{i) of degree d, such that f{x ih) = Px,h{i), for every 0 < * < lOd. 

Notice that if / is a multivariate polynomial of total degree at most d, then 
for every x and h, the line ix,h is an /-polynomial Given the values of / on a 
line ix,h, testing whether this line is an /-polynomial can be done as follows: 

— find, using interpolation, a univariate polynomial F(i) of degree d, consistent 
with the values of / at the d-\-l points x, x-\-h , . . . , x-\-dh (i.e., P{i) = f{x-\-hi) 
for every Q <i < d). 

® It is not known whether MQ are essential in general for testing even in the uniform 
case (see [33]); this is known only for specific problems such as monotonicity testing 
(see [14]). 

^ To see that, assume f{x) ~ ^ cij YYiLi where aj is the coefficient of the j'th 

term in /, dj is the degree {dj < d), and kf is the index of the I'th variable in 
that term (note that kf is not necessarily different than for Zi 7 ^ I 2 ). In this 
case, for every fixed x = (* 1 , . . . , Xm) and h = {hi , . . . , hm) the value f{x ih) = 
i(Xf .3 +ihy), which, of course, is a degree d univariate polynomial in i. 

J Kj Kj 
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~ check, for every (d + 1) < i < lOd, that f{x + ih) = P{i). If so accept; 
otherwise reject. 

We show how this basic test is used to build a uniform and a distribution-free 
low-degree test. 

3.2 Low-Degree Test for the Uniform Distribution 

The low-degree test for the uniform distribution is done by randomly sampling 
0{d + e~^) lines (i.e., by uniformly choosing x,h € U™), and checking that 
each of these lines is an /-polynomial. The correctness of this algorithm follows 
immediately from the following theorem ([35, Theorem 9]). 

Theorem 1. There exists a constant cjj such that for 0 < <5 < if f is 

a function from to F, such that all but at most S fraction of the lines 
{^x,h\x, h G U™} are f -polynomials, then there exists a polynomial g : F'^ — > F 
of total degree at most d such that distu{f,g) < (1 + o(l))d (provided that 
|F| > \Qd). 

3.3 Distribution-Free Low-Degree Tester 

Denote the class of multivariate polynomials of total degree d by Vdeg- 
section we show that the tester described in the previous subsection can be 
modified into a distribution- free tester for low-degree multivariate polynomials. 
That is, we present an algorithm with query complexity 0{d^ -\-d-e~^) that, given 
a distance parameter e, a degree parameter d, and access to random sampling 
of F"^ according to D and to membership queries of a function / : F™ ^ F, 
distinguishes, with probability at least |, between the case that / is in V^^g, and 
the case that / is (e, D)-far from 

The natural generalization of the uniform-distribution tester above for the 
distribution-free case would be to replace the sampling of the tested lines by 
sampling according to the distribution D] i.e. sample the 0{d e“^) lines by 

choosing x ^ D and h ^ U and check that these lines are /-polynomials. How- 
ever, we do not know whether this modification actually works. Instead, the 
algorithm we present consists of two stages - in the first stage we simply run the 
uniform distribution test as is, and check that the function / is e-close to Vf^g 
with respect to the uniform distribution; the second stage is the generalization 
suggested above. We prove that this combined strategy actually works. 

Polyje, d) 

Set k max{e“^, cu ■ d}. Repeat 5k times: 

— Choose X, h F^. If the line £x,h is not an /-polynomial, return FAIL. 

— Choose X £d F”*, h F^. If the line is not an /-polynomial, return 

FAIL. 

return PASS 
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Theorem 2. Algorithm Poly{e, d) is a distribution-free tester for V^^g,' its query 
eomplexity is 0{df d ■ e~^). 

The correctness of the algorithm relies on the following lemma: 

Lemma 1. Let cjj be the constant as above. For every 0 < <5 < if f is a 

function from to F such that 

^ Fra:^h~u{^x,h *s uot an f polynomial} < 5, and 
^ *s not an f polynomial} < S, 

then there exists a polynomial g : F"^ —fFof total degree at most d such that 
distoif, g) — i_ 4 og = (1 + o(l))^ (provided that |F| > lOd). 

The proof of the above lemma is omitted for lack of space. The proof is 
similar to ones presented in [35] and will appear in the full version of the paper. 



Proof, of theorem 2. 

To prove that the algorithm is indeed a distribution-free tester for V^f.g , we prove 
the following two facts: 



1. If / is in then the algorithm accepts / with probability 1. 

2. If / is (e, Z?)-far from V^^g, then the algorithm Poly{e,d) rejects / with 
probability at least |. 



As explained before, if / is indeed a multivariate polynomial of total degree 
d, then every line is an /-polynomial. Hence, it follows that such / is accepted 
by the tester with probability 1. Assume from now on that / is (e, D)-far from 
'^deg- Notice that, by the definition of k, for e' = i, / is (e',D)-far from 

Based on Lemma 1, either Prx,h'^u{^x,h is not an f polynomial} > 2 + 40 e' ’ 
or is not an f polynomial} > 2 + 40 t' (otherwise, it follows that 

there exists a degree d polynomial g such that distoif, g) < 



(2+40e')-(l-40- 






Y < e', contradicting the fact that the D-distance of / from any such polyno- 
mial is at least e'). Assume that the first event occurs. Therefore, the probability 
that a randomly chosen line (,x,h is an /-polynomial is at most (1 — 2 +\o<l' ) • 
Hence, the probability that the algorithm accepts / is at most (1 — 2 +lo?')^^ = 

(^ ~ 2 fc+ 4 o )^^ — — 5 (^^® inequality follows since cu > 100 [35] implying 

that k > 100). Similarly, if the second event occurs, the probability that a ran- 
domly chosen line ix,h, where x ^ D and ~ 17 , is an /-polynomial is at most 
(1 — 2 + 40 e' ) ■ Hence, as before, the probability that the algorithm accepts / is at 
most (1 - 24 ^)®'' ^ ° 
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4 Distribution-Free Testing of Properties with 
Self-corrector 

A careful examination and manipulation of the distribution-free tester presented 
in the previous section shows that, in fact, the only two features of low-degree 
multivariate polynomials used in the construction are: 

— the existence of a one-sided error uniform distribution tester for low-degree 
polynomials, and 

— the ability to efficiently compute (with high probability), in every point x of 
the domain, the correct value of the polynomial g that is close to the input 
function /, if / is indeed close to a multivariate low-degree polynomial. We 
refer to this ability as ” property self-correction” . 

We argue that it is possible to construct a distribution-free tester for every 
property V that satisfies these two conditions. We first define the notion of a 
’’property self-correction” formally (it has already been defined implicitly and 
used in [35]), and then introduce a general scheme for obtaining distribution- free 
testers for a variety of properties that satisfy the conditions. 

The notion of ’’property self-corrector” is a generalization of the notion of 
self-correctors for functions introduced by Blum, Luby and Rubinfeld in [10]. 
A self-corrector for a specific function / is a randomized algorithm that given 
oracle access to a function g which is e-close to /, is able to compute the value 
of / in every point of the domain. This definition can be generalized to classes 
of functions, specifically demanding that all the functions in the class are self- 
correctable using the same algorithm. 

Definition 3. An e self-corrector for a property V is a probabilistic oracle ma- 
chine M , which is given an oracle access to an arbitrary function f : X A 
and satisfies the following condition: 

If there exists a function g G V such that distjj{f,g) < e (i.e., f is e-close to 
V), then Pr{M-f(x) = g{x)} > for every x G X. If f gV, then Pr{M^{x) = 
f{x)} = 1 for every x G X. 

Note that the definition of ’’property self-corrector” refers to distance mea- 
sured only with respect to the uniform distribution, however, we still use these 
correctors for the construction of distribution- free testers. Observe that a neces- 
sary condition for the existence of an e-self-corrector for a property V is that for 
every function / such that distu{f, V) < e (i.e., / is e-close to V with respect to 
the uniform distribution), there exists a unique function g G V that is e-close to 
V (implying that e cannot be too large) . Notice that the property of monotonic- 
ity does not fulfill this requirement®. Hence, the distribution-free monotonicity 
tester that is presented in the next section requires a different approach. 

® Consider for example the following function f : [n] {0, 1}: for every 1 < i < ^ set 

f{i) = 1, and for every ^ -|- 1 < i < n set f{i) = 0. / is |-far from monotone, and it 
is |-close to both constant functions: 0 and 1. 
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Next, we describe the generalized distribution-free testing scheme. Let V be 
a property, let T-p be a uniform distribution tester for V with query complexity 
Qt that has one-sided error, and let Cp be an e' property self-corrector for V 
with query complexity Qc- Let e < e', and f : X ^ A. 

Testers (e) 

Run T^{e). If Tp(e) = FAIL, then return FAIL 
Repeat | times: 

Choose X £d X. 

Repeat twice: Run C^(x) ; If f(x) A C^{x), then return FAIL . 
return PASS 



Theorem 3. Algorithm Testerp){e) is a distribution- free tester for V with query 
complexity Qt{() + f ■ Qc- 

Proof. It is obvious that the query complexity of the algorithm Tester 23(e) is 
indeed as required. Hence, we only have to prove the correctness of the algorithm. 
To do so, we prove the following two facts: 

— A f G V then / is accepted by the algorithm with probability 1 . 

~ if / is (e, £>)-far from V, then / is rejected by Tester23(e) with probability 
at least |. 

If / is indeed in V, then it passes the uniform test with probability 1 , and the 
value returned by the self-corrector is always identical to the value of /. Hence, 
it is clear that in this case / is accepted by the algorithm. Assume from now on 
that / is (e, D)-far from V. In this case we distinguish between two possibilities: 
If / is (e, C/)-far from V, then the probability that it passes the uniform test 
is at most 

If / is (e, C/)-close to V, then there exists a function g G V such that 
dist{f,g) < e. However, since distp>{f,V) > e, we deduce that distnif, g) > e 
(in other words, Pra,.^23{/(a:) yf g{x)} > e). If / is accepted by the algorithm 
then one of the two following events happened: either we failed to sample a point 
in which / and g differ, or we succeeded to sample such a point, but both runs of 
the self-corrector failed to compute the value of g in this point. The probability 
of the first event is at most (1 — e)'<^ — definition of a property 

self-corrector the probability of the second event is at most < ^- Therefore, 
the total probability that / is accepted by the algorithm is at most 

Hence, in both cases the probability that / is accepted by the algorithm is 
at most □ 

Remark 1 . We used the assumption that there exists a uniform distribution test 
for the property V that has one-sided error. However, the same transformation 
can be applied also when the uniform distribution tester has two-sided error, 
only that the resulting distribution-free tester as well has two-sided error. 
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As was previously stated, the algorithm that was explicitly presented in 
Section 3 can actually be described as an application of this scheme for the 
class of low-degree multivariate polynomials. Hence, instead of fully describing 
the distribution-free tester and proving its correctness, it was enough to show 
that this property can be tested in the uniform distribution and that it can be 
self-corrected. This scheme, however, also implies the existence of distribution- 
free testers for other properties. Among these properties are low-degree multi- 
variate polynomials over GF(2), juntas and dictatorships functions. A function 
/ : {0, 1}" ^ {0, 1} is said to be a fc-junta if there exists a subset of {xi, . . . , x„} 
of size k that determines the value of / (i.e., / is independent of the other vari- 
able). A special case of juntas are dictatorship functions, where a single variable 
determines the value of the function. These properties (and other related prop- 
erties) have uniform distribution testers, as was shown in [3, 17, 32]. In addition, 
they are all subsets of the class of low-degree polynomials over GF{2) which is 
self correctable (for example, k juntas are a special case of degree k multivariate 
polynomials), and thus are self correctable (see [3] and [10]). Therefore, we can 
apply the scheme described in this section to obtain distribution-free testers for 
these properties. 

Remark 2. Notice that given two properties V and V' such that V' C V, the 
fact that V is testable in the uniform distribution does not imply that V' is 
thus testable (to see this observe, for example, that every property is a subset 
of the class of all functions that is clearly testable). However, the fact that V 
is self-correctable implies that V' is self-correctable (using the same correction 
algorithm) . 

5 Distribution-Free Monotonicity Testing on the 
d-Dimensional Cube 

In this section, we present testers for monotonicity over the d-dimensional hyper- 
cube with respect to an arbitrary distribution D. As before, we assume D to 
be fixed but unknown, and beside the ability to sample according to D we 
assume no knowledge of D. For simplicity, we begin our discussion with the 
case d = 1, and show that given access to random sampling according to D 
and to membership queries, there is a distribution- free tester for monotonicity 
over [n], whose query complexity is This algorithm can be generalized 

to a distribution-free tester for monotonicity over the d-dimensional hyper-cube 
whose query complexity is 0( ^°^ ). 

We begin with a few notations and definitions. Denote by [n] the set 
{1, . . . , n}, and by [n]‘^ the set of d-tuples over [n]. For every two points i and j 
in [n]‘^ we say that i < j ii for every 1 < fc < d, ik ^ jk- Let (A, <^) be some 
linear order. 

Definition 4. We say that a function f : [n]‘^ A is monotone if for every i 
and j ifi< j then f{i) <a f{j). 
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Definition 5. Let f : [n]'^ A be a function. A pair (i,j) is said to be an 
/-violation if i < j and f{i) /(/). 

Let D be any distribution on [n]‘^, and let /S' be a subset of [n]‘^. Define 
Proli} = Prx^oiX = i], and Pr,,{5} 

5.1 Testing Monotonicity for the Line (d = 1) 

In this section we consider the case d = \. Our algorithm is a variant of the 
algorithm presented in [14] for testing monotonicity, with respect to the uniform 
distribution. However, the analysis presented here for this algorithm is quite dif- 
ferent. The algorithm works in phases, in each phase a “center point” is selected 
according to the distribution D (in the original algorithm, the center point is 
selected uniformly), and the algorithm looks for a violation of the monotonicity 
with this center point. The search for a violation is done by randomly sam- 
pling in growing neighborhoods of the center point. In other words, in the case 
d = 1, the only change made in the original algorithm in order to adjust it to 
be distribution-free is that the choice of center points is made according to D. 
However, the search for violations remains unchanged. It is important to observe 
that, when dealing with an arbitrary distribution, there is no connection between 
the distance of the function from monotone (or the probability of the violation) 
and the number of pairs that form a violation of monotonicity®. Hence, the cor- 
rectness of the algorithm for the uniform distribution (i.e., the fact that in a 
function that is far from monotone we find a violation of monotonicity with high 
probability) does not imply its correctness for the general case. 

Algorithm-monotone-l-dim_D (/, e): 
repeat | times 
choose i [n] 
for fc <— 0 . . . [log €\ do 
repeat 8 times 
choose a £n [2^] 

if/(*-a) >A /(*) then return FAIL 
for fc <— 0 . . . [log(n — i)] do 
repeat 8 times 
choose a £r [2*^] 

if/(i) >A /(* + a) then return FAIL 
return PASS 



Theorem 4. Algorithm monotone- 1-dimu is a distribution-free monotonicity 
tester over the line with query complexity 0( ^°^" ). 

To prove this theorem, we need the following definitions and lemmas. 

® Observe, for example, the function / : [n] — > [n] such that for every 0 < i < n — 2 
we set f(i) = i, f{n — 1) = n and f{n) = n — 1. Set the distribution D to be 
D(n- 1) = D{n) = i. 
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Lemma 2. Let f : [n] A be a function, and let S C [n] be a set. If for every 
f -violation (i,j) either i G S or j G S, then there exists a monotone function f 
that differs from f only on points in S. 

A similar claim was proved in [13]; proof omitted. An immediate conclusion 
of the above lemma is the following: 

Lemma 3. Let f : [n] ^ A be a function {e,D)-far from monotone. Given 
S C [n], if for every f -violation (i,j) either i G S or j G S, then Pruj^} > e. 

Definition 6. For an f -violation (i,j), we say that i is active in this violation 

\{k : i < k < j , f{i) >A f{k)}\ > ^ ^ 

similarly, j is active in this violation if |{fc : i < k < j , f{j) <a f(k)}\ > 
2 

That is, i is active in an /-violation (//), if for at least half of the points 
i < k < j, {i, k) is also an /-violation (i.e., f{i) f{k)). 

Observation 1: For every /-violation (i,j), at least one of i and j is active in 
(//). (Proof omitted) 

Define the active set of / (denoted Af) as the set of all points that are active 
in some /-violation. Following this observation and applying Lemma 3 to the set 
Af, if / is (e,D)-far from monotone then PrujA/} > e. We turn now to prove 
Theorem 4. 

Proof. It is easy to see that the query complexity of the algorithm is as required. 
Hence, we are left to prove that it is indeed a distribution-free tester. The fact 
that every monotone function / is accepted by the algorithm follows immediately 
from its definition. From now on, assume that / is (e, D)-far from monotone; we 
prove that / is rejected with probability at least |. Our algorithm may fail to 
detect that / is not monotone if either one of the following two events occurs: 

1. None of the points sampled by the algorithm according to D is in A/. 

2. The algorithm picked at least one point i G Af, but failed to detect that 
i belongs to some /-violation. 

It is easily verified that the probability of the first event is at most (1 — e) “ < 
^ < 1/6. We now turn to bound the probability of the second event. By the 
definition of Af, for every i G Af there is a j such that either (*,/) or {j,i) is 
an /-violation and i is active in this violation. Assume w.l.o.g. that (i,j) is an 
/-violation. For k = min{/ : 2* > / — z} (i.e., k is the smallest integer s.t. 

j < i -\- 2^), we can claim that |{Z \ i < I < i -\- 2^, f{i) >a /(Oil is more than 
i • 2^. This is due to the fact that j — i > 2^“^, and since i is active in the 

/-violation (//), for at least half the points I between i and j (i.e., at least 
points) the pair (/ 1) is an /-violation. The probability that the algorithm fails 
to find an /-violation for this k is at most (|)® < i, and hence the probability 
of the second event is at most implying that the total probability that the 
algorithm will wrongly accept / is at most i. □ 
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Remark 3. In the journal version of [14], an additional tester for monotonicity 
on the line, called ” Sort-Check-II” , is presented. This algorithm can also be 
transformed to be a distribution-free monotonicity tester over the line. However, 
we do not know if it can be generalized to higher dimensions. 

We saw how to test monotonicity over the one-dimensional hyper-cube (the 
line) when the distance is measured with respect to an arbitrary distribution. It 
is possible to generalize this algorithm to the d-dimensional case. The full details 
of the generalized algorithm and its analysis are omitted from this version and 
will appear in the full version of this paper. 
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Abstract. Let be an n-element subset of {0, 1}'^ chosen uniformly 
at random, and denote by Pd,n ■= comr Xd,n its convex hull. Let Ad,n 
be the density of the graph of Pd,n (i.e., the number of one- dimensional 
faces of Pd,n divided by (j))- Our main result is that, for any function 
n(d), the expected value of Ad^n(d) converges (with d —> cxs) to one if, 
for some arbitrary e > 0, n{d) < (\p2 — e)'^ holds for all large d, while it 
converges to zero if n{d) > (\/2 -|- holds for all large d. 



1 Introduction 

Polytopes whose vertices have coordinates in {0,1} {0/1 -poly topes) are the ob- 
jects of study in large parts of polyhedral combinatorics (see [10]). Since that 
theory has started to grow, people have been interested in the graphs (defined by 
the vertices and the one-dimensional faces) of the polytopes under investigation. 
The main reason for this interest was, of course, the role played by polytope 
graphs with respect to linear programming and, in particular, the simplex algo- 
rithm. 

Later it was recognized that the graphs of the 0/1-polytopes associated with 
certain combinatorial objects (such as matchings in a graph or bases of a ma- 
troid) might also yield good candidates for neighborhood structures with respect 
to the construction of random walks for random generation of the respective ob- 
jects. A quite important (yet unsolved) problem arising in this context is the 
question whether the graphs of 0/1-polytopes have good expansion properties 
(see [3,5,7]). 

We are short of knowledge on the graphs of general 0/1-polytopes [13]. 
Among the few exceptions are results about their diameters [8] and their cycle 
structures [9]. Particularly striking is the fact that several special 0/1-polytopes 
associated with combinatorial problems have quite dense graphs. The most 
prominent example for this is probably the cut polytope CUTfe, i.e., the con- 
vex hull of the characteristic vectors of those subsets of edges of the complete 
graph Kk that form cuts in Kk- Barahona and Mahjoub [1] proved that the 
graph of CUTfc is complete, i.e., its density equals one (where the density of a 
graph G = {V,E) is [^[/(l^'))- Since the dimension of CUTfc is d = ( 2 ) and 
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there are n = 2^ ^ cuts in Kk, the cut polytopes yield an infinite series of d- 
dimensional 0/1-polytopes with (roughly) vertices (for some constant c) and 
graph-density one. 

In this paper, we investigate the question for the graph-density of a typical 
(i.e., random) 0/1-polytope. The (perhaps surprising) result is that in fact the 
high density of the graphs of several 0/1-polytopes important in polyhedral 
combinatorics (such as the cut polytopes) is not atypical at all. Our main result 
is the following theorem, where Exp[-] denotes the expected value. 

Theorem 1. Let n : N — > N be a funetion, and let Pd,n(d) •= ^d,n{d) with 

an n{d)-element subset Xd^n{d) o/ {0, 1}"^ that is ehosen uniformly at random. 
Denote by the density of the graph of Pd^n(d)- 

(i) If there is some e > 0 sueh that n{d) < — e)‘^ for all sufficiently large d, 

then lim Exp[Z\d „M)] = 1. 

d — >oo 

(a) If there is some e > 0 such that n{d) > (V2 + e)‘^ for all sufficiently large d, 
then lim Exp[Z\,i (^)] = 0. 

d^(X) 

There is a similar threshold phenomenon for the volumes of random 0/1- 
polytopes. Let Pd,n{d) be the convex hull of n{d) points in {0, l}'^ that are chosen 
independently uniformly at random (possibly with repetitions). Dyer, Fiiredi, 
and McDiarmid [2] proved that the limit (for d ^ oo) of the expected value of 
the d-dimensional volume of Pd,n[d) is zero if, for some e > 0, n{d) < (^ — e)‘^ 
holds for all sufficiently large d, and it is one if, for some £ > 0, n{d) > + 

holds for all sufficiently large d. Due to ^ < 1.214 and \[2 > 1.414, one can 
deduce (we omit the details) the following result from this and Theorem 1. It 
may be a bit surprising due to the fact that the only d-dimensional 0/1-polytope 
with d-dimensional volume equal to one is the 0/1-cube convjO, 1}'^, which has 
only graph-density ^a^- 

Corollary 1. For every <5 > 0 there are (infinitely many) 0/1-polytopes with 
both graph density and volume at least (1 — d). 

Another threshold result that is related to our work is due to Fiiredi [4]. 
He showed that, in the setting of Theorem 1, the limit (for d ^ oo) of the 
probability that Pd,n{d) contains the center of the 0/1-cube is zero if, for some 
e > 0, n(d) < (2 — £) • d holds for all sufficiently large d, and it is one if, for 
some £ > 0, n(d) > (2 -|- £) • d holds for all sufficiently large d. The material in 
Sections 2.2, 2.3, and 2.4 of our paper is very much inspired by Fiiredi’s work. 

The aim of Sections 2 and 3 is to prove Theorem 1. Since it is a bit more 
convenient, we switch from 0/1-polytopes to poly topes whose vertices have coor- 
dinates in { — 1, -1-1} {P.\-polytopes). Recalling that the density of a graph equals 
the probability of a randomly chosen pair of its nodes to be adjacent. Proposi- 
tions 4 and 5 (Section 3), together with Proposition 3, imply Theorem 1 (with 
the £’s in Propositions 4 and 5 replaced by log ^ ^ and log , respectively). 

We close with a few remarks in Section 4. 
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2 The Long-Edge Probability r(fc, m) 

We define := {—1,+!}'* and := Q^;\{— 1,1} (where 1 is the all-one 
vector). For v,w G denote by Q{v,w) the subset of all points in that 
agree with v and w in all components, where v and w agree. Thus, Q(r^, w) is the 
vertex set of the smallest face of conv containing v and w. The dimension of 
this face is 

dist(r>,'u;) := #{ i e {1, . . . , d| : Vi Wi } 

(the Hamming distance of v and w). Let Q*(v,w) := Q,{v,w) \ {v^w}. 

We refer to [12] for all notions and results from poly tope theory that we rely 
on. For a polytope P, we denote by V(P) and E(P) the sets of vertices and edges 
of P, respectively. Recall that, for X C Q^, we have V(convX) = X. 

The following fact is essential for our treatment. It can easily be deduced 
from elementary properties of convex polytopes. 

Lemma 1. For two vertices v and w of a FI -polytope P C we have 

{t',w;}sE(P) conv{ri, icj n conv(P n Q*(r>, ic)) = 0 . 

Throughout this section, let Yk^m G ('^) (the m-element subsets of Q^) be 
drawn uniformly at random and define 

T{k,m) := Prob[conv(Yfe^m) n conv{— 1, 1} = 0] . 

Thus, T{k,m) is the probability that the “long edge” conv{— 1, 1} is an edge of 
the polytope conv(Yfc^m U {—1, 1}). The next lemma follows from Lemma 1. 

Lemma 2. Let Xd^n G ('^‘*) be chosen uniformly at random, defining the poly- 
tope Pd,n '■= conv Xd^n- Choose a two-element subset {w,w} of Xd,n uniformly 
at random. Then, for every k G {1, . . . , dj and m G {0, . . . , min{2^ — 2, n — 2}}, 
we have the equation 

Prob[{z;,w} e E(Pd_„) I dist(r>,w) = fc, #(Xd_„ n Q*(r>,w)) = m] = T{k,m) . 

Via Lemma 2, asymptotic bounds on T(k,m) will turn out to be important 
for the proofs in Section 3. In fact, we will basically compute (or estimate) 
the probability 7r(d, n) (see Section 3) that two randomly chosen vertices of a d- 
dimensional random ±l-polytope with n vertices are adjacent by partitioning the 
probability space into the events “dist(z;, w) = k and ff{Xd,n H Q*(z;, w)) = m” 
for all k G {1 , . . . ,dj and m G {0, . . . , min{2^ — 2, n — 2}}. 

For the study of T(k,m), it is convenient to consider the conditional proba- 
bility 

a{k,m) := Prob[conv(rfe,„) n conv{-l, 1} = 0 | Ffc,™ n (-Yfc,™) = 0 ] , 
which is related to T{k,m) in the following way. 
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Lemma 3. For 0 < m < 2^ — 2 we have 

T{k,m) = ^ a(fc,m) . 

\ m ) 

Proof. Clearly, conv(Yfc,„) n conv{-l,l} = 0 implies Yfc,™ D (-Yfc,„) = 0. 
Thus, the statement in the lemma is due to the fact that the number of sets 
r' e {^) with Y' n {-¥') = 0 is • 2 ™. 

We will first show that a{k, m) can be interpreted as a conditional probability 
that a random m-element subset of a certain vector configuration in does 

not contain the origin in its convex hull (Section 2.1). The latter probability is 
then related to the expected number of chambers in a certain random hyperplane 
arrangement. This number of chambers is finally estimated via a well-known 
bound due to Harding (Section 2.2). 

As a point of reference for the proofs in Section 3, let us state the following 
monotonicity result here, whose (straightforward) proof we omit. 

Lemma 4. For 0 < m < 2^ — 3, we have T{k,m) > T{k,m + 1). 

2.1 The Vector Configuration Vr 

Let ip : — > iJi — > K’’ denote the orthogonal projection of onto the 

hyperplane iJi := {x G = 0}, followed by the orthogonal projection 

to the first r coordinates. We denote by Vr '■= ip{Qr+i) the image of Q*+i under 
the projection (p. We omit the simple proof of the following result. 

Lemma 5. The projection ip is one-to-one on 



Lemma 6. For Zr^m € (^) chosen uniformly at random, we have 

a{r-\-l,m) = Prob[0 ^ conv(Zr,m) | n (-Z^,™) = 0 ] . 

Proof. Since conv Yk^m H conv{— 1, 1} = 0 holds if and only if 0 ^ conv ip(Yk^m) 
holds, the claim follows from Lemma 5 (because T/c,mn(— Yfe^m) = 0 is equivalent 
to ip{Yk,m) n {-ip{Yk,m)) = 0). 

With V+ := ip{v € Q*+i : Vr+i = +1}, we have Vr = V+ U (— V+) and 
Vf n (— ) = 0. For any fixed finite subset S C K'’, and a uniformly at random 
chosen e S {— denote a{S) := Prob[0 ^ conv{£gS : s € S'}]. 

Lemma 7. Let Zf.,^ € ) he chosen uniformly at random. Then we have 

a(r-hl,m) = Exp[a(Z+„)] . 

Proof. This follows from Lemma 6. 
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2.2 Hyperplane Arrangements 

For s G R’'\{0} let H{s) := {x S M’’ : = 0}. The two connected components 

of M’’ \ H{s) are denoted by H^{s) and H~{s), where s G For a finite 

subset S' C K’’ \ {0} denote by 'H(S) := {i?(s) : s G S} the hyperplane arrange- 
ment defined by S. The connected components of H{S) := K’’ \ Uses 
the chambers of H{S). We denote the number of chambers of H{S) by x(S). 

Observation 1 Let C be a chamber of'H(S) for some finite subset S C K’’\{0}. 
For each s G S, we have either C C H^(s) or C Q H~ (s). Defining s{C)s '■= +1 
in the first, and e{C)s ■= —1 in the second case, we may assign a sign vector 
s{C) G {—1,+!}'® to each chamber C ofTt{S). This assignment is injective. 

Lemma 8. For each finite subset S C M’’ \ {0}, the following equation holds: 



Proof. Let S C M’’ \ {0} be finite. By the Farkas-Lemma (linear programming 
duality), for each e G {— 1 ,+!}'®, we have 0 ^ conv{£sS : s € S} if and only if 
there is some h G R’' such that h'^{£ss) > 0 holds for all s G S, which in turn is 
equivalent to 



for all s € S. Since the latter condition is equivalent to £ being the sign vector 
of some chamber of TC{S), the statement of the lemma follows. 

Lemma 7 and Lemma 8 immediately yield the following result. 



The following upper bound on y(-) will (via Lemma 9) yield upper bounds 
on a{-, ■) that are sufficient for our needs. We denote b{p,q) := (?)■ 

Theorem 2 (Harding, see Winder [11, p. 816]). For S G have 

x{S) < 26(r— l,m — 1) . 

2.3 Bounds on r(fc, m) 

Proposition 1. For 0 < m < 2^ — 2 the following inequality holds: 



#{ £ € {-1,+!}'® : 0 ^ conv{ £sS : sGS}} = y(S) 




Lemma 9. For G ) chosen uniformly at random, we have 



a{r+l,m) = ^ • Exp[ x(Z+„,) ] . 




Proof. With r = fc — 1, Lemma 3, Lemma 9, and Theorem 2 yield this. 
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In fact, one can prove that, if m is not too large relative to fc, then the bound 
of Proposition 1 is asymptotically sharp as k tends to infinity. Since we do not 
need the result here, we omit the proof which (next to the theorem of Winder’s 
cited in Theorem 2) relies on the fact that the probability of an / x / matrix with 
entries from {—1, -1-1} (chosen uniformly at random) being singular converges to 
zero for I tending to infinity (see [6]). 

Proposition 2. For m(k) G o(2^), we have 



lim 
k — >-00 



^r(/c, m{k)) 



b{k — 2, m(fc) — 1) 

2m(fc)-l 



0 . 



2.4 A Threshold for r(fc, m) 

For a; G K, let 

F{x) := 



e 2 dt , 



J-C 

i.e., <1> is the density function of the normal distribution. 

Lemma 10 (de Moivre- Lap lace theorem). For each jj, G M., the following 
holds: 

lim = .5(2,) 

q — *oo 

Theorem 3. For each e > 0, we have 

lim r(fc, [(2 -I- e)fc]) = 0. 

k^oo ^ ' 

Proof. Let e > 0 be fixed, and define, for each k, mf{k) := [(2 -|- e)k~\. 

Let (5 > 0 be arbitrarily small, and choose fj. < 0 such that 



Due to lim 

k — >-00 



m+(fc) 






= 2 + £, we have, for large enough k, 



( 1 ) 



k-2 < l + ^^rnt{k)-l . 

Due to Proposition 1, we have 

b{k — 2, mf{k) — 1) 



r(fc,m+(fc)) < 



2m^ {k) — l 



( 2 ) 



( 3 ) 



Since 6(-,-) is monotonically increasing in the first component, (2) yields that 
the right-hand side of (3) is bounded from above by 

(fc)-l _|_ ^y/,7i+(fc) _ 1 ^ m+(/c) - 






( 4 ) 
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By Lemma 10 (with q substituted by — 1), (4) may be bounded from 

above by ^(2/r) + | for all large enough k (because of lim mf{k) = oo). Thus, 

k^oo 

from (1) we obtain 

T{k,m^{k)) < S 

for all large enough k. 



Exploiting Proposition 2, one can also prove the following result. It comple- 
ments Theorem 3, but since we will not need it in our treatment, we do not give 
a proof here. 



Theorem 4. For each e > 0 we have 



lim r(fc, [(2 — e)fcj ) = 1 



3 The Edge Probability 7z{d,n) 

Throughout this section, let the set Xd,n G (^‘^) be drawn uniformly at random, 
Pd,n '■= conv Xd,n, and let {n,w} G (’^2 ") chosen uniformly at random as 
well. Our aim is to determine the probability 

Tr{d,n) := Prob[ {n, w} G E(Pd^„) ] . 



Let us further denote 



TTk{d,n) := Prob[{n, w} G E(Pd.n) | dist(n,w;) = k] . 

Since {n,r(;} is uniformly distributed over (^'^), the distance dist(n,tc) has the 
same distribution as the number of positive components of a point chosen uni- 
formly at random from Q^\{— 1}. Therefore, the following equation holds. 

Lemma 11. 

Ad,n) = 

k=l ^ '' 



The following result, stating that Tr{d, •) is monotonically increasing, is quite 
plausible. Its straightforward proof is omitted here. 

Proposition 3. The function 7r((i, •) is monotonically decreasing, i.e., for 3 < 
n < 2‘^ — 1, we have Tr{d, n) > Tr{d, n -|- I) . 

The next result implies part (i) of Theorem 1 (see the remarks at the end of 
Section 1). 



Proposition 4. For each e > 0, we have 



lim 

d — KXD 



(d, 



I . 
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Proof. Let e > 0, and define (d) := [ 2(2 . For each fj, > 0, denote 

K^{d) := {keZ:l<k<^ + ^IVd} 

and 

:= min { 7Tfc(d, n7(d)) : fc S iF^(d) } . 

Then, due to Lemma 11, we have 

Tr{d,nf{d)) > 

keK^(d) 

For every 12 > 0, this implies (by Lemma 10) that 

7r(d,nf(d)) > - ly) ■ n- (d) 

holds for all large enough d. Therefore, it remains to prove, for all /r > 0, 

lim TT~{d) = 1 . 

a — >00 

With 

ik ■■= Prob[X^„-(^) nQ*(t;,M;) = 0 | dist(n,w) = k] , 
we have, for each k G K^{d), 

TTk{d,nf{d)) > ik > 

(see Lemma 1). Clearly, 

Exp[#(X^„-(^) nQ*(w,tt;)) I dist(z;,u;) = /c] = ('^7(d) -2) 

and thus, the estimation 

Exp[#(X^ „-(^) nQ*(t;,u;)) I dist(z;,u;) = fe] < , 

hold for each k. By Markov’s inequality, this implies 

Prob[#(X^„-(^) nQ*(t;,tt;)) > d- I dist(w,w;) = fc] ^ 1 

for each d and k. For fc = [f + , (8) yields 



( 5 ) 

( 6 ) 

( 7 ) 



( 8 ) 



Prob|#(X^„-(^, nQ'(w,w)) > | dist(w,ui) = [^ + /i'/d\] 

^ 2 <»> 

for all d. Since d ■ < 1 holds for large enough d, (9) implies — 

1 — ^ for large enough d. Therefore, 

holds, which, by (7), finally implies (6). 



1 
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The next result yields part (ii) of Theorem 1 (see the remarks at the end of 
Section 1). 

Proposition 5. For each e > 0, we have 



lim 7T 

d — KXD 




0 . 



Proof. Let e > 0, and define nf{d) := 



2{^+E)d 



For each n > 0, denote 



K^{d) := {fc € Z : ^ - fj.Vd < k < d} , 

and define 

:= max{ Trk{d,n^{d)) : k G K^{d) } . (10) 

Then, due to Lemma 11, we have 



li-r-Vd] 

Tr{d,n+{d)) < 2 - ^ + 7T+(d) . 

fc=i 

Thus, for every i/ > 0, by Lemma 10, 

TT{d,nf{d)) < d>{—2fj.) + v + TT'^{d) 

holds for all large enough d. Therefore, it remains to prove, for all /r > 0, 

lim 7T+(d) = 0 . (11) 

d — KXD 

For k G {1, . . . , d} and m G {0, . . . , 2^ — 2}, we define 

Cfe(m) := Prob[#(X_^ „+(^) n Q*(z;,'u;)) = m | dist{v,w) = k] 

(i.e., ^fc(O) = fk in the proof of Proposition 4). Then we have (see Lemma 2) 

2'“-2 

TTk{d,n+{d)) = ^ fk(rn)T{k,rn) . (12) 

m— 0 

Since r(/c, •) is monotonically non-increasing by Lemma 4, we thus can estimate 

3fc-l 

7rfc(d,n+(<i)) < fki.m) + T{k,3k) , 

m— 0 

for each k G K^{d). This yields, again for for each k G K^{d), 

TTk{d,n^{d)) < 3d • max{ (m) : 0<m<3d— 1} 

+ max{ T{k',3k') : k' G K^{d) } . (13) 



On the Graph-Density of Random 0/1-Polytopes 327 



According to Theorem 3, 



lim max{r(fc^, 3fc^) : k' £ K^{d)} = 0 

d — >oo ^ 

holds. Hence, by (13) and (10), equation (11) can be proved by showing 

lim (3d • max{ ^fc(m) : 0<m<3d — 1, k £ K^{d) }) = 0 . (14) 

d—KX) ^ 

Let us first calculate (using the notation (a)t, := a{a — 1) • • • (a — 6 -h 1)) 

Cfe(m) = 

U+(d)-; 

(2'^ - (n+(d)-2)! 



p'=_2W 2‘^-2'' \ 

V m / VriJ(d) — m— 2/ 
/ 2‘‘-2 \ 
Vra+(d)-2/ 



m J (2^^ - 2)„+(^)_2 (n+(d) - m- 2)! ’ 



where the left, the middle, and the right factor of (15) may be bounded from 

( d k \ 

^ 2 <^ ] , and (2'^)™, respectively. Thus, we obtain. 



for 0 < m < 3d — 1, 

For k £ K^{d), we have 
1 



1 - 



2<i-k 



Hd) 



(16) 



1 - 



2d-k 



< 1 - 



1 



2^+ld\/d 



1 - 



1 

2^+tkVd 



2(i+=)‘^ 



2 ^ 



<2£d — ^j.\/d 



(17) 



For d tending to infinity, the expression in the square brackets of (17) converges 
to i < i (where e = 2.7182 •• • is Euler’s constant). Therefore, (17) and (16) 

imply ^fc(m) < • (1/2)^'“^ (for k G K^{d), 0 < m < 3d — 1, and for 

large enough d). This finally yields (14), and therefore completes the proof. 



4 Remarks 

The threshold for the function r(-, •) described in Theorems 3 and 4 is much 
sharper than we needed for our purposes (proof of Proposition 5). The sharper 
result may, however, be useful in investigations of more structural properties of 
the graphs of random 0/1-polytopes. A particularly interesting such question is 
whether these graphs have good expansion properties with high probability. 
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Abstract. Let y be a positive real number and let {Xi} be an infinite 
sequence of Bernoulli random variables with the following property: in ev- 
ery realization of the random variables, E[Xi\Xi, X2, ■ • ■ , Xi-i] < 

y. We specify a function F(x,y) such that, for every positive integer x 
and every positive real y, Xi > x) < F{x, y)\ moreover, for every 

X and y, F(x, y) is the best possible upper bound. We give an interpreta- 
tion of this stochastic process as a gambling game, characterize optimal 
play in this game, and explain how our results can be applied to the 
analysis of multi-stage randomized rounding algorithms, giving stronger 
results than can be obtained using the traditional Hoeffding bounds and 
martingale tail inequalities. 



1 Introduction 

Consider the following gambling game. A player starts with a fortune of y and 
a goal of x. At each step the player chooses a, bet p G {0, 1] and tosses a coin 
with probability of heads p. His fortune is reduced by p, and he scores a success 
if the coin comes up heads. He wins the game if he achieves x successes while 
maintaining a nonnegative fortune. A function G{x, y) can serve as the function 
F mentioned in the Abstract if and only if, for all (x,y), G{x,y) is an upper 
bound on the success probability of all strategies with fortune y and goal x. Our 
main result is a uniformly optimum choice of this function. 

Theorem 1. Let x he any positive integer and y, any positive real number. Let 
F(x, y) denote the supremum, over all strategies, of the probability of achieving 
x successes with fortune y. Then F(x, y) is specified recursively as follows. 



{ if X < y then F{x,y) = 1 

ify<x<y+l then F{x, y) = {x — y) + {1 — x + y)F{x, x — 1) 
ify+^<x then F{x,y) = J^^Qexp{—z)F{x — l,y — z)dz 

Although we do not have a closed form for F{x,y), we can easily compute an 
upper bound which is good enough for our purposes. 

Corollary 1. Let s = {x — \y'\)!y. // s > 1, then F{x,y) < 

S. Arora et al. (Eds.): APPROX 2003-1- RANDOM 2003, LNCS 2764, pp. 329-340, 2003. 

© Springer- Verlag Berlin Heidelberg 2003 
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Proof.F(x, 2 /) is less than or equal to the probability that a Poisson random 
variable with mean y is greater than or equal to x— \y\. The result follows using 
a Chernoff bound for the tail of the Poisson distribution. □ 

Assume that infinitesimal bets are allowed - the precise meaning of an in- 
finitesimal bet will be specified in Section 2. (If all bets must be positive reals 
then a success probability arbitrarily close to F{x,y) can be achieved by plac- 
ing suitably small positive bets instead of the infinitesimal bets in the strategy 
below.) 

Theorem 2 (Best Strategy). The following strategy achieves the success prob- 
ability F{x, y): 

Strategy G on {x,y): 

if y ^ X then bet 1 ; 

if X — 1 < y < X then bet y — x 1 ; 

if y Si X — 1 then continue placing infinitesimal bets until a success occurs 



In order to explain the link between the gambling game and multistage ran- 
domized rounding algorithms we first present an abstract setting for the tradi- 
tional single-stage randomized rounding algorithms [5] . Consider a mixed integer 
program of the following form: 

Minimize z subject to: 

Integrality Constraints: Xi G {0, 1}, i = 1, 2, • • • , n; 

Covering Constraints: = 1, * = 1, 2, • • • , t; 

Resource Constraints: J ~ 1, 2, • • • , m. 

Each set Si or Tj is a subset of {1, 2, • • • , n}, and the sets Si are disjoint. 

Each Xi represents an activity, such as the selection of a path in a graph. Each 
covering constraint requires that one activity be selected from a specified set; for 
example, in an integer multicommodity flow problem we might require that a 
given source-sink pair be joined by a path. Each resource constraint represents a 
bound on some resource; in a multicommodity flow problem the resource might 
be a vertex or edge, with cj representing its nominal capacity and Tj, the set 
of paths that consume a unit of that capacity. The variable z represents the 
maximum amount by which the capacity of any resource is exceeded. 

Randomized rounding begins by solving a linear programming relaxation in 
which the integrality constraint on each variable Xi is replaced by the constraint 
0 < Xi < 1. Let {yi,y 2 , ■ ■ ■ ,yn) be the optimal solution to this linear program 
and let z* be the optimal value. Randomization is then used to select exactly one 
variable in each set Si to be set equal to 1 . Variable Xr is selected with probability 
yr- This rounding process gives a feasible solution to the integer program. 

Let us consider the effect of this rounding process on the resource constraints. 
For the jth resource constraint let pij = J^r^s-nT- yr- Then pij is the probability 
that a unit of resource j is used to satisfy covering constraint i. Thus the total 
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consumption of resource j is distributed as ^ij-> where the Xij are indepen- 
dent Bernoulli random variables and P{Xij = 1) = Pij. The expected value of 
the sum of these random variables is at most z*Cj. The Hoeffding bound on sums 
of independent Bernoulli random variables is used to bound the probability that 
the total usage of resource j exceeds a target value zcj and a union bound is 
used to obtain an upper bound on the probability that some resource exceeds 
its target value. 

In a multistage randomized rounding algorithm, a sequence of mixed integer 
programs of the above form is solved. The resources and their capacities are 
the same in all these programs, but in all other respects the structure of the 
fcth program may depend on the solutions constructed for the k — I integer 
programs preceding it. We wish to bound the total usage of each resource over 
all the stages. For each resource j this total usage is a sum of Bernoulli random 
variables; for each set Si n Tj in the fcth integer program there is a Bernoulli 
random variable with mean . Because of the adaptiveness in the choice of 
integer programs the Hoeffding bound, which requires that the Bernoulli random 
variables be independent, can only be used separately for each integer program, 
but not for the entire multistage process. However our gambling game, with the 
fortune defined as cj times the sum of the optimal values of the linear programs, 
and the bets defined as the occurring in all the stages (for fixed resource 
j), is applicable to the multistage process because it allows the parameters of 
successive Bernoulli random variables to be dependent. In effect, the gambling 
game assumes that an adversary chooses the successive pij adaptively, with the 
goal of maximizing total resource usage, subject to a constraint on the sum of 
the p^j over all stages. Note that martingale tail inequalities are not useful in 
this setting because they are not sensitive to this global constraint. 

As a specific illustration we refine a bicriterion optimization result due to Ravi 
[6]. Motivated by the Telephone broadcast problem, Ravi gave a polynomial-time 
algorithm for constructing a spanning tree of small diameter and small maximum 
degree in a graph G. He showed that, if G has a spanning tree of diameter 
at most A and maximum degree at most D*, then his algorithm produces a 
spanning tree of diameter 0(Z\ log n) and maximum degree 0{D* logn-|- log'^n) 
with high probability. Our analysis of the same algorithm using the gambling 
game shows that the algorithm produces a spanning tree of diameter 0{A log n) 
and maximum degree 0{D* log n) with high probability. 

Subsequent to Ravi’s paper, Bar-Noy, Guha, Naor and Schieber had also ad- 
dressed the problem of constructing a short tree of small degree. In [1], they 
presented an algorithm which constructs a tree of diameter 0{A log n) and max- 
imum degree 0{{D* + Z\)logn). The algorithm relies on a version of the ran- 
domized rounding theorem from [3], which exploits the fact the the sum of the 
absolute values of the entries of any column of the constraint matrix are small; 
we note that using the version from [7,4], which exploits matrices such that 
columns have few non-zero entries, would still give a bound on the degree that 
would depend on A. (The Telephone broadcast problem now has a much better 
approximation algorithm [2] , but that algorithm is purely combinatorial and no 
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longer relies either on linear programming or on trees of small height and degree; 
however we consider the bicriteria problem of constructing trees of small height 
and degree as interesting in its own right). 

2 Definitions 

Definition 1. Given a non-negative integer x called the goal and a non-negative 
real number y called the fortune, a game is defined recursively as follows: ifx = 0, 
the game is a win, if x > 0 and y = 0, the game is lost. If x,y > 0, then the game 
consists of a (finite or infinite) sequence of bets (pi) such that — Dj o,long 

with, for each i, a game for goal x — 1 and fortune y — ^^6 success 

probability of a game is the probability that the game eventually ends up in a 
winning state. 

A game can be represented by a (possibly infinite) complete binary tree with 
labelled edges, where the two edges from the root are labelled 1—pi and pi, the 
left child of the root is a game for [x, y — pi), and the right child of the root is 
a game for (x — l,y — pi). 

The success probability can be computed as follows. 

Fact 1 The success probability of a game has the following properties. 

If X = 0 then the probability equals 1. 

If X > 1 and y = 0 then the probability equals 0. 

Otherwise, the success probability of a game T is given by: 

Pr(T succeeds) = EHd — Pj)piY’v{Ti succeeds), (1) 

i j<i 

where p\ is the first bet of the game, pi is the i*^ bet of the game if all previous 
bets were unsuccessful, and Ti is the remaining game played when the bet is 
the first successful bet. 

Note that Ti is a game with goal x—1 and fortune y — J2j<i Pi) - The (possibly 
infinite) number of terms in the sum in Equation 1 is the maximum number of 
bets performed by the game while the goal is x. 

Note that as defined, for a given intermediate state {x' ,y'), the game may 
decide to bet different amounts, depending on the past history of the game from 
its starting point. Indeed, if we label vertices of the game tree by the current 
goal and fortune, there may be several vertices with the same label {x',y'), and 
each of them is root of a game for (x', y')-, these games may all be different from 
one another. 

Definition 2. A memoryless game is a game such that at every step, the bet 
placed depends only on the current goal x' and on the current fortune y' . A 
strategy TL is a function {x, y) p, where x is a positive integer, y is a positive 
real number, and p C (0, 1] is such that p < y. 
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A memoryless game can naturally be exteirded into a strategy Ti by definiirg 
H{x,y) = min(l,?/) for every (x,y) which does not appear as a label of a tree 
node. Coirversely, to airy strategy naturally corresponds a game for each (x,y), 
which proceeds as follows: Consider the current state {x, y). If x, y > 0, we place 
the bet p = H{x,y). With probability p, the bet is successful and the new goal 
is x' = X — 1 ; with probability 1 — p, the bet is unsuccessful and the goal is still 
x' = X. We then continue playing the game associated to 7i on the new state 
{x',y-p). 

If H{x, y) denotes the success probability of the game associated to strategy 
Ti,, Equation 1 then becomes: 

H{x,y) =Y^Y[{1 - pj)p,H{x - l,y ~'^p^), (2) 

i j<i j<i 

where pi = H{x,y), and in general pi = H{x,y - if J/ ~ is 

positive. When we talk about the success probability of a strategy, we refer to 
the success probability of the associated game. 

Given x and y, we are interested in computing the supremum, over all games 
for (x,y), of the success probability of the game. Some easy cases may serve as 
a warmup: evidently this success probability equals 1 whenever y > x since it is 
then sufficient to place x bets each equal to 1. Another easy case is when x = 1 
and y < 1: the supremum is then reached by the strategy which bets y as we 
now explaiir. 

Lemma 2. sup.j, game for (i y) Pr(T succeeds) = min(l,y). 

Proof. Given that the fortune is y, the expected number of successes is always 
at most y, regardless of the game. Thus y is an upper bound to the probability 
that the number of successes is at least 1. This is reached by the strategy which 
makes a single bet equal to y. □ 

Definition 3. A continuous strategy is an extension of strategies which in ad- 
dition is allowed to place infinitesimal bets, of the form: “repeat betting infinites- 
imal bets until there is a success or until the fortune spent equals p” , for some 
P € (0,y]. 

We use the notation expz to mean the step: “repeat betting infinitesimal bets 
until there is a success or until the fortune spent equals z” . For consistency, a 
continuous strategy obviously has: If 7i(x, y) = expz, then (x, y — t) = expz-t 
for every t G (0, z]. This can be seen as the limit, as N tends to infinity, of the 
process which bets (1/iV, . . . , 1/iV) up to zN times or until first success. Siirce 
the binomial distribution converges to a Poissoir process in the limit, the time 
to first success is distributed exponentially: for any real irumber t < z, we have: 

Pr(fortrme speirt at the end of this step is > t) = e~*, 

and the probability that a success occurs during this step is 1 — e“^. We will 
ofteir use the term “discrete strategies” as a synonym for strategies, to contrast 
them from coirtiiruoris strategies. 
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These definitions formalize the Best Strategy Theorem stated in the intro- 
duction. Theorem 1, which is our main result, follows from the Best Strategy 
Theorem as a simple corollary. 



3 Proof of the Best Strategy Theorem 

This section is devoted to the proof of Theorem 2 for y < x (the Theorem is 
obvious for y > x). We will prove that F(x,y) = G(x,y). In subsection 3.1 we 
will prove that F(x, y) < G{x, y). In subsection 3.2, we will prove that F{x, y) > 
G{x,y). 

3.1 The Upper Bound 

Definition 4. A game for (x, y) is finite if its game tree is finite. A strategy is 
finite if for every (x,y), the associated game is finite. 



Lemma 3. A game if finite if and only if each tree Ti in Fact 1 is finite, and 
the number of such trees is finite. A discrete strategy is finite if and only if, for 
every (x, y), the number of terms in the sum in Equation 2 is finite. 

Proof. The statement of the lemma is obvious for games. As for the statement for 
strategies, one direction is obvious and the other one can be proved by infuction 
on X. □ 

The following lemma shows a reduction from games to finite games. 

Lemma 4. Given x, y, e and a discrete game T for (x, y), there exists a game U 
for (x,y), which is finite, and such that Pr(T succeeds) < Pr([/ succeeds) + xe. 

Proof. 

Given T, consider the following game U which simulates T. 

Game U to simulate the game tree T : 

Let (pj) be the sequence of bets which would be placed by T on (x, y), if every 
bet was unsuccessful. 

If Pj ^ ^ then place a bet q = pi, and 

if the bet is successful, recursively simulate the game represented by the left 
subtree; 

if not, recursively simulate the game represented by the right subtree. 
Otherwise, play the game associated to the strategy which bets q = min(l, y). 



The proof is by induction on x. Note that pj is at most y, hence the series 
converges. Let io be the number of terms of the sum if that is finite, or else the 
smallest index such that Pi ^ Game U coincides with T for the first io 

bets, and makes at most io + 1 bets while the goal is x, hence f/ is a finite game 
by induction on x and by Lemma 3. A short calculation concludes the proof. 
□ 
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Fix e > 0. Consider a game T for {x,y). By Lemma 4, there exists a finite 
game U such that Pr(T succeeds) < Pr(C/ succeeds) + xe. Since this holds for ev- 
ery e, we deduce that Pr(T succeeds) < sup^ finite game^'-^^ succeeds). Since 
this holds for every T, we deduce that f (x, y) = sup^ finite game succeeds). 

The following lemma shows a reduction from finite games to finite memoryless 
games. 

Lemma 5. Given x, y and a finite game T , there exists a finite game U which 
is memoryless, and such that Pr(T succeeds) < Pr(C7 succeeds). 

From Lemma 5, we get that F{x, y) = sup.^ finite strategy finish 

the proof of the upper bound, all we need is to prove the following Proposition, 
to which we will devote the rest of this section. 

Proposition 1. Q is better than any finite strategy. 

We start by observing that G is convex. 

Lemma 6 (Convexity). If x > y+l then G{x+l,y) + G{x—l,y) > 2G{x,y). 

Proof. The proof uses induction on x + [y]. The base case x = l,y = 0 is 
easy. Consider the general case. Run the three processes G{x+1, y), G{x, y) and 
G{x — 1, y) so as to couple the Poisson processes. 

Case 1 : If y < x — 2 then all three processes start with an exponential waiting 
time to first success. We use straightforward induction on G for x — 1 and the 
remaining fortune at the time of first success. 

Case 2 : If y € (x — 2,x — 1] then we let y' = x — 2 and observe the three 
processes until the remaining fortune is y' . For a shorthand, let Gz = G{z,y') 
for any z. Let a = y — y' . After some calculations, we get: 

G(x— 1, y)-l-G(x-l-l, y)— 2G(x, y) = e “(Ga,-i+Ga,+i — 2Ga,) + ae “(l-t-Ga,— 2Gx-i). 

By induction hypothesis for y' = x — 2 (noting that \y'~\ < [y]), both quan- 
tities within brackets are non negative, hence the lemma. 

□ 

The following is a technical Lemma which will be used in the sequel. It uses the 
notion of continuous games, similar to the notion of continuous strategies. 

Definition 5. Given (x,y), a continuous game is defined recursively as follows: 
if X = 0, the game is a win, if x > 0 and y = 0, the game is lost. If x,y > 0, 
then the game consists of a (finite or infinite) sequence of steps, where step i 
consists either of bet pi > 0 or of the repetition of infinitesimal bets until there 
is a success or until the fortune spent equals pi; we must have — V- 

each i, if step i was a positive bet pi, then we also have a game for goal x — 1 
and fortune y — 'Y)j<iPi'} */ * 'was a sequence of infinitesimal bets up to pi, 

then we also have, for each t such that pi + ■ ■ ■ +pi-i < t < pi + ■ ■ ■ +Pi, a game 
for (x - l,y - t). 
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Lemma 7. Let H and K be two continuous games for (x, y) which both go 
through a state where the remaining fortune is y' < x — 2, after having had 
0, 1 or 2 successes, and then continue with strategy Q after that point. Then 
K{x,y) > H{x,y) if and only if the probability of having had 0 successes before 
y' is greater for K than for H . 

Proof. Uses the convexity lemma. □ 

The following lemma is the core of the proof of Theorem 2. 

Lemma 8. Consider a continuous game whose first bet is arbitrary positive and 
which then continues by using strategy Q . Then its success probability is less than 
or equal to G{x, y). 

Proof. The proof is by induction on x.li x = 0, then there is nothing to prove. 
Consider x>l. Let p be the first bet placed by the game on (x, y). Let K{x, y) 
denote the success probability of the game. There are several cases. 

Case l:y < x — 1. Then Q starts by making infinitesimal bets. 

Subcase 1.1: Assume y — p < x — 2. Let y' = y — p. We compare K to 
the following game L. L makes infinitesimal bets until first success or y'; in the 
former case, let t be the remaining fortune at that time of first success: L then 
places a bet of t — y' to get to fortune y' . Once the fortune is y' , L continues by 
following strategy G. 

We appeal to Lemma 7 to compare K and L. The probability that K has 
had 0 successes by the time the fortune is y' is 1 — p. The probability that L has 
had 0 successes is e~^ > 1 — p, hence L{x,y) > K{x,y). 

It is now easy to compare L to G- L{x,y) = e~^L'{x — l,y — z)dz + 

e~^G{x,y — p). Game L' places a first bet of {y — z) — y' and then continues 
using strategy G- By induction applied to x and L' , we have L'{x — l,y — z) < 
G{x — l,y — z. Thus L{x,y) < fg e~^G(x—l,y — z)dz + e~^G{x,y—p) = G{x,y). 

Together, these inequalities imply K{x,y) < L{x,y) < G{x,y). 

Subcase 1.2: Assume y — p > x — 2. Let y' = x — 2. Our game K first bets p, 
bringing its fortune down to y — p, then applies G '■ if the first bet was successful, 
it bets r = {y — p) — y', bringing its fortune down to y' . If it was unsuccessful, 
it makes infinitesimal bets until a first success (when the remaining fortune is t) 
or y' , and in the first case, bets t — y' , bringing the fortune down to y' . 

We compare K to the following game L\ makes infinitesimal bets until first 
success (when the remaining fortune is t') or y' , and in the former case, bets 
t' — y' , bringing the fortune down to y' . Once the fortune is y' , L continues by 
following strategy G- 

We appeal to Lemma 7 to compare K and L. The probability that K has 
had 0 successes by the time the fortune is y' is {l—p)e~^. The probability that L 
has had 0 successes is > (1 — p)e~^, hence L{x,y) > K{x,y). 

The comparison of L to G is similar to Subcase 1.1. Case 2: y > x — 1. (Of 
course, we still have y < x). Then G starts by betting y — (x — 1). 

Subcase 2.1: y < y — (x — 1). Let z = x — 1. Game K first bets p, bringing 
its fortune down to y — p, then applies G by betting r = (y — p) — z to bring 
the fortune down to z, then continues applying G- We compare K to the game 



A Gambling Game 337 



associated to strategy Q : make a single bet of y — z to bring the fortune down 
to z, then continue applying Q. 

The winning probability of K is K{x, y) = p+ (1 — p)r+ (1 — p)(l — r)G{x, z). 
The winning probability of Q is {p+r) + (1 —p — r)G{x, z), and one easily checks 
that Q is better than K. 

Subcase 2.2: p> y — [x —1). Let y' = x — 2. Game K bets p, bringing the 
fortune down to y — p, then, in case of success, bets {y — p) — y'\ in case of failure, 
it makes infinitesimal bets until first success (when the remaining fortune is t) 
or 2 /', and in the former case, bets t — y' . Strategy Q first bets u = y — {x — 1), 
then, in case of success, bets 1 to bring the fortune down to y'; in case of failure, 
it makes infinitesimal bets until first success (when the remaining fortune is t) 
or y' , and in the former case, it bets t — y' to bring the fortune down to y' . 

We appeal to Lemma 7 to compare K and Q. The probability that K has 
had 0 successes is px = (1 — m — where u = y— {x — 1) and u + v = p. 

The probability that Q has had 0 successes is pg = {1 — u)e~^. The ratio is 



Proposition 1 then follows by induction on N, the maximum number of steps 
of the finite strategy applied to (x,y), and by appealing to Lemma 8. 

3.2 The Lower Bound 

Lemma 9. Let T be a eontinuous game for {x,y). For each positive e, there 
exists a discrete game U for (x, y) such that Pr(f7 succeeds) > Pr(T succeeds) — 
xe. 

Proof. We will compare T to the following randomized game U. 

Game U to simulate T on (x, y): 

If T places a positive bet p > 0, then bet p; 

if successful, recursively simulate the right subtree of T ; 
if not, recursively simulate the left subtree of T. 

Otherwise (T places an infinitesimal bet spending up to p), bet a = min(p, e); 
if unsuccessful, recursively simulate the game for (x,y — a); 
otherwise, with probability [e““ — (1 — a)]/(l — a), recursively simulate the 
game for (x, y — a); and with the remaining probability, recursively simulate 
the game for (x — 1, y — T), where the random variable T S [0, a] has density 
function e~* /{I — e~“). 



It is easy to see that U simulates T exactly (except for “giving away” a success 
with probability (1 — a) x [e~“ — (1 — «)]/(! ~ «)) while spending only a little bit 
more fortune: every time T has a success, game U spends up to e more fortune 
than T. But T needs only x successes to reach the goal: so, if T on (x, y) still 



Pk 

PG 



(1 - -^)e" < (1 - x)e" < e-"e" = 1, 

1 — M 



hence G is better than K. 



□ 
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has a remaining fortune of at least xe when it reaches its goal, then an initial 
fortune of y will be sufficient for game U. The probability that T reaches its goal 
while spending the last xe part of its fortune, is bounded by the probability that 
it has one or more success while spending that last xe fortune; that is less than 
or equal to the expected number of successes during that time, i.e. less than or 
equal to xe. Hence Pr([/ succeeds) > Pr(T succeeds) — xe. 

Finally, it is easy to de-randomize U: just pick some T’ such that the success 
probability for the game for (cc— 1, y—T') is greater than or equal to the expected 
value, over T, of the game for {x—l,y — T). We thus obtain a strategy satisfying 
the Lemma. □ 

From Lemma 9, we get that F{x, y) = supj, continuous game succeeds), 
which is obviously greater than or equal to G{x^ y), and the proof of Theorem 2 
is complete. 

4 A Randomized Rounding Application 

In [6] , Ravi presented an algorithm to build a spanning tree of small diameter and 
small maximum degree in a given graph. Here, using the framework of gambling 
games, we present a finer analysis of Ravi’s algorithm, thus improving on his 
approximation bounds. Here is the algorithm. 

Input: a graph G with vertex set V{G) and a bound A on the desired diameter. 
Output: a spanning tree G 

Dynamic variables in the algorithm are a subgraph K of G and a set C CV 
of cluster centers. 

1. Initialize K to a, graph with vertex set V{G) and no edges; Initialize C to 
V{G). 

2. While there is more than one cluster center do: 

(a) Set up an integer program of the type described in the Introduction, 
where: 

i. For every path P of length at most A directed from one cluster center 
to another there is a 0-1 variable x(P); 

ii. For every cluster center c S (7 there is a covering constraint of the 
form 'Y^x{P) = 1, where the sum is over all paths P directed out of 

iii. For every vertex v there is a resource constraint of the form ^ x{P) < 
z, where the sum is over paths P incident with vertex v. 

(b) Solve the linear programming relaxation of this integer program (Ravi 
shows that this can be done in polynomial time); 

(c) Use randomized rounding to obtain a feasible solution to the integer 
program, giving, for each cluster center, a path of length at most A 
directed to some other cluster center; 

(d) Consider the graph iJ with one vertex for each cluster center in C and one 
directed edge (c, c') for each path in the solution to the integer program. 
Each vertex in H has out-degree exactly 1 . By elementary graph theory. 
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find a subgraph H' of H containing at least edges and consisting 

of a disjoint union of “stars,” where each star consists of a root vertex 
c' and one or more vertices c such that (c, c') is an edge of H\ 

(e) For each edge (c, c') in H' , add the corresponding path to K, and delete 
c from C, the set of cluster centers. 

3. Let c be the unique remaining cluster center. By breadth-first search from c 
in K, construct a tree T spanning V{G). 



Theorem 3 (Ravi). Assume that G has a spanning tree T* of diameter at 
most A and maximum degree at most D* . Then with high probability the above 
algorithm will produce a spanning tree of height 0{A log n) and maximum degree 
0{D* logn -I- log^ n). 

Proof. The height of T equals the height of K. Since the “while” loop is executed 
O(logn) times, each vertex n S G is linked to c in AT by a sequence of at most 
O(logn) flow paths, each of length at most A. Hence T has height 0{Alogn). 
Using T* , it is easy to construct a multicommodity flow of length at most A 
and value at most 2D*. Hence the solution of the LP in step 2a has value at 
most 2D* . By the randomized rounding Theorem (which is based on a Hoeffding 
bound), the integral multicommodity flow in step 2b has value at most 2D* + 
0(log n) with high probability, and so the union of the flow paths taken in 
step 2e also has maximum degree at most 2D* + O(logn). Since the “while” 
loop is executed 0(log n) times, the resulting graph K has maximum degree 
0{D* logn + log^ n) (with high probability), and hence the output T also has 
maximum degree 0{D* log n + log^ n) (with high probability). □ 

We will use our gambling game to provide a more refined analysis of Ravi’s 
algorithm. 

Theorem 4. Assume that G has a spanning tree T* of diameter at most A and 
maximum degree at most D* . Then the above algorithm will produce a spanning 
tree of height 0{A\ogn) and maximum degree 0{D* logn) (with high probabil- 
ity). 

Proof. Fix a vertex £ of G. We play the gambling game: the initial fortune is 
2D*t where t is the number of integer programs solved in the algorithm, and the 
goal is 6aD* log n, where a is chosen to guaranteed that t <a log n. The bets are 
done in phases corresponding to the successive integer programs in the algorithm. 
In each phase there is a bet for each cluster center, equal to the probability that 
vertex £ will lie in the path from that cluster center selected by the integer 
program. The sum of these bets is just the sum of the fractional variables in 
the linear program corresponding to directed paths passing through £. Since the 
value of the linear program is at most 2D* the sum of the bets in each phase 
is at most 2D* and the sum of all bets does not exceed the fortune 2D*t. The 
degree of £ in the tree T is at most twice the number of selected paths through 
£ in the course of the algorithm, and the number of selected paths is equal to 
the number of successes in the gambling game. Hence the probability that the 
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degree of £ is T is greater than or equal to QaD* logn is at most F{3aD* logn, 
2aD* logn), which can be shown to be exponentially small in n. □ 
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Abstract. In this paper we consider the problem of testing bipartiteness of gen- 
eral graphs. The problem has previously been studied in two models, one most 
suitable for dense graphs, and one most suitable for bounded-degree graphs. 
Roughly speaking, dense graphs can be tested for bipartiteness with constant 
complexity, while the complexity of testing bounded-degree graphs is 0{^/n), 
where n is the number of vertices in the graph. Thus there is a large gap between 
the complexity of testing in the two cases. 

In this work we bridge the gap described above. In particular, we study the prob- 
lem of testing bipartiteness in a model that is suitable for all densities. We present 
an algorithm whose complexity is 0{mm{y/n, Im)) where m is the number 
of edges in the graph, and match it with an almost tight lower bound. 



1 Introduction 

Property testing algorithms [16, 8] are algorithms that perform approximate decisions. 
Namely, for a predetermined property P they should decide whether a given object O 
has property P or is far from having property P. In order to perform this approximate 
decision they are given query access to the object O. Property testing problems are 
hence defined by the type of objects in question, the property tested, the type of queries 
allowed, and the notion of distance to having a property. Much of the focus of property 
testing has been on testing properties of graphs. In this context several models have 
been considered. In all models, for a fixed graph property P, the algorithm is required to 
accept graphs that have P and to reject graphs that are e-far from having P, for a given 
distance parameter e. In all cases the algorithm is allowed a constant probability of 
failure. The models differ in the type of queries they allow and in the notion of distance 
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they use (which underlies the definition of being e-far from having the property). The 
complexity of the algorithm is measured by the number of queries to the object Q it 
performs. 



l. 1 Models for Testing Graph Properties 

The first model, introduced in [8], is the adjacency-matrix model. In this model the 
algorithm may perform queries of the form: “Is there an edge between vertices u and v 
in the graph?” That is, the algorithm may probe the adjacency matrix representing the 
graph. We refer to such queries as vertex-pair queries. The notion of distance is also 
linked to this representation: a graph is said to be e-far from having property P if more 
than en? edge modifications should be performed on the graph so that it obtains the 
property, where n is the number of vertices in the graph. In other words, e measures the 
fraction of entries in the adjacency matrix of the graph that should be modified. This 
model is most suitable for dense graphs in which the number of edges m is 0{ii? ) . This 
model was studied in [8, 3, 2, 1,4, 11,7]. 

The second model, introduced in [9], is the (bounded-degree) incidence-lists model. 
In this model, the algorithm may perform queries of the form: “Who is the i’th neighbor 
of vertex v in the graph?” That is, the algorithm may probe the incidence lists of the 
vertices in the graph, where it is assumed that all vertices have degree at most d for 
some fixed degree-bound d. We refer to these queries as neighbor queries. Here too 
the notion of distance is linked to the representation: A graph is said to be e-far from 
having property P if more than edn edge modifications should be performed on the 
graph so that it obtains the property. In this case e measures the fraction of entries in 
the incidence lists representation (among all dn entries), that should be modified. This 
model is most suitable for graphs with m = 0{dn) edges; that is, whose maximum 
degree is of the same order as the average degree. In particular, this is true for sparse 
graphs that have constant degree. This model was studied in [10, 9, 6]. 

In [15] it was suggested to decouple the questions of representation and type of 
queries allowed from the definition of distance to having a property. Specifically, it was 
suggested to measure the distance simply with respect to the number of edges, denoted 

m, in the graph. Namely, a graph is said to be e-far from having a property, if more than 
em edge modifications should be performed so that it obtains the property. In [15] the 
algorithm was allowed the same type of queries as in the bounded-degree incidence-lists 
model, but no fixed upper-bound was assumed on the degrees and the algorithm could 
query the degree of any vertex. The main advantage of this model over the bounded- 
degree incidence-lists model is that it is suitable for graphs whose degrees may vary 
significantly. 

The Model Studied in this Paper. In this work we are interested in a model that may be 
useful for testing all types of graphs: dense, sparse, and graphs that lie in-between the 
two extremes. As is discussed in more detail in the next subsection, the two extremes 
sometimes exhibit very different behavior in terms of the complexity of testing the same 
property. We are interested in understanding the transformation from testing sparse (and 
in particular bounded-degree) graphs to testing dense graphs. 
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Recall that a model for testing graph properties is defined by the distance measure 
used and by the queries allowed. The model of [15] is indeed suitable for all graphs 
in terms of the distance measure used, since distance is measured with respect to the 
actual number of edges m in the graph."^ Thus this notion of distance adapts itself to the 
density of the graph, and we shall use it in our work. 

The focus in [15] was on testing properties that are of interest in sparse (but not 
necessarily bounded-degree) graphs, and hence they allowed only neighbor queries. 
However, consider the case in which the graph is not sparse (but not necessarily dense). 
In particular suppose that the graph has edges, and that we are seeking an 

algorithm that performs o{-Jn) queries. While in the case of sparse graphs, there is 
no use in asking vertex-pair queries (i.e., is there an edge between a particular pair of 
vertices), such queries may become helpful when the number of edges is sufficiently 
large. Hence, we allow our algorithms to perform both neighbor queries and vertex-pair 
queries. 

1.2 Testing Bipartiteness 

One of the properties that has received quite a bit of attention in the context of property 
testing, is bipartiteness. Recall that a graph is bipartite if it is possible to partition its 
vertices into two parts such that there are no edges with both endpoints in the same 
part. This property was first studied in [8] where it was shown that bipartiteness can 
be testing by a simple algorithm using 0(1/ e^) queries. This was improved in [3] to 
0(l/e^) queries. The best lower bound known in this model is f7(l/e^'^), due to [7]. 
Thus the complexity of this problem is independent of the number of vertices n and 
polynomial in 1 /e. 

The complexity of testing bipartiteness changes significantly when considering the 
bounded-degree incidence-lists model. In [10] a lower bound of f7(yTi) is established 
in this model, for constant e and d (the degree bound). An almost matching upper bound 
of 0{^/n ■ poly (1/e)) is shown in [9]. Thus, in the case of bipartiteness there is a large 
gap between the results that can be obtained for dense graphs and for constant-degree 
graphs. Here we venture into the land of graphs that are neither necessarily sparse, 
nor necessarily dense, and study the complexity of testing bipartiteness. Other graph 
properties exhibit similar (and sometimes even larger) gaps, and hence we believe that 
understanding the transformation from sparse to dense graphs is of general interest. 

1.3 Our Results 

In this work we present two complementary results for n-vertex graphs having m edges: 

• We describe and analyze an algorithm for testing bipartiteness in general graphs 
whose query complexity (and running time) is 0{min{^/n, r? jm) ■ poly (log n/e)). 

We assume for simplicity that the number of vertices, n, and the number of edges, m, are 
both given to the testing algorithm. If they are not known exactly, the algorithm can work 
using upper bounds on these values. The tightness of these bounds will naturally affect the 
performance of the algorithm. 
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The algorithm has a one-sided error (i.e., it always accepts bipartite graphs). Further- 
more, whenever it rejects a graph it provides evidence that the graph is not bipartite 
in the form of an odd-length cycle of length poly (log n/e). 

• We present an almost matching lower bound of fI{Taui{y/n, n? /m)) (for a constant 
e). This bound holds for all testing algorithms (that is, for those which are allowed a 
two-sided error and are adaptive). Furthermore, the bound holds for regular graphs. 

As seen from the above expressions, as long asm = that is, the average degree 

is O(yTi), the complexity of testing is 0{y/n). Once the number of edges goes above 
we start seeing a decrease in the query complexity which in this case is at most 
0((n^/m) • poly (log n/e)). In terms of our algorithm, this is exactly the point where 
our algorithm starts exploiting its access to vertex-pair queries. Our lower bound shows 
that this behavior of the query complexity is not only an artifact of our algorithm but is 
inherent in the problem. 

Note that even if the graph is sparse then we obtain a new result that does not follow 
from [9]. Namely, we have an algorithm with complexity 0{y/n ■ poly (1/e)) for sparse 
graphs with varying degrees. 

1.4 Our Techniques 

We present our algorithm in two stages. First we describe an algorithm that works for 
almost-regular graphs, that is, graphs in which the maximum degree is of the same order 
as the average degree. The algorithm and its analysis closely follow the algorithm and 
analysis in [9]. Indeed, as long as the degree d of the graph is at most y/n, we execute 
the [9] algorithm. The place where we depart from [9] is in the usage of vertex-pair 
queries once d > y/n, We refer to our first algorithm as Test-Bipartite-Reg. 

In the second stage we show how to reduce the problem of testing bipartiteness of 
general graphs to bipartiteness of almost-regular graphs. Namely, we show how, for 
every given graph G, it is possible to define a graph G" such that: (1) G' has roughly 
the same number of vertices and edges as G, and its maximum degree is of the same 
order as its average degree (which is roughly the same as the average degree in G); (2) 
If G is bipartite then so is G', and if G is far from bipartite then so is G'. We then show 
how to emulate the execution of the algorithm Test-Bipartite-Reg on G' given query 
access to G, so that we may accept G if it accepts G', and reject G if it rejects G' . 

In the course of this emulation we are confronted with the following interesting 
problem: We would like to sample vertices in G according to their degrees (which aids 
us in sampling vertices uniformly in G', a basic operation that is required by Test- 
Bipartite-Reg). The former is equivalent to sampling edges uniformly in G. In order not 
to harm the performance of our testing algorithm, we are required to perform this task 
in 0{mm{y/n, ii^/m)) queries. If m is sufficiently large (once again, if m > n^'®), this 
can be performed simply by sampling sufficiently many pairs of vertices in G. However, 
we do not know how to perform this task exactly (in an efficient manner) when the 
number of edges is significantly smaller than Nonetheless, we provide a sampling 
procedure that selects edges according to a distribution that approximates the desired 
uniform distribution on edges, and is sufficient for our purposes. The approximation is 
such that for all but a small fraction of the m edges, the probability of selecting an edge 
is f?(l/m). This procedure may be of independent interest. 
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We also conjecture that variants of our construction of G' (and in particular a prob- 
abilistic construction we suggest in the long version of this paper [12]), may be useful 
in transforming other results that hold for graphs whose maximum degree is similar to 
their average degree, to results that hold for graphs with varying degrees. 

We establish our lower bound by describing, for every pair n, d (n even, d > 64), 
two distributions over d-regular graphs. In one distribution all graphs are bipartite by 
construction. For the other distribution we prove that almost all graphs are far from 
bipartite. We then show that every testing algorithm that can distinguish between a 
graph chosen randomly from the first distribution (which it should accept with proba- 
bility at least 2/3), and a graph chosen randomly from the second distribution (which 
it should reject with probability at least 2/3), must perform fl{inm{y^,n/d)) = 
f?(min(Y^, r? jm) queries. In the lower bound proof we show the necessity of both 
neigbhor queries and vertex-pair queries. Specihcally by using only one type of queries 
the lower bound increases. 



1.5 Further Research 

As noted previously, there are other problems that exhibit a significant gap between 
the query complexity of testing dense graphs (in the adjacency-matrix model) and the 
complexity of testing sparse, bounded-degree graphs (in the bounded-degree incidence- 
lists model). In particular this is true for testing fc-colorability. It is possible to test dense 
graphs for fc-colorability using poly (/c/e) queries [8, 3], while testing sparse graphs 
requires fi{n) queries [6]. We stress that these bounds are for query complexity, where 
we put time complexity aside. We would like to understand this transformation from 
essentially constant complexity (for constant k and e) to linear complexity, and we 
would like to know whether any intermediate results can be obtained for graphs that 
are neither sparse nor dense. Other problems of interest are testing whether a graph has 
a relatively large clique [8], testing acyclicity of directed graphs [5], and testing that a 
graph does not contain a certain subgraph [1]. 



2 Preliminaries 

Let G = (V,E) he an undirected graph with n vertices labeled 1, ..., n, and let m = 
m{G) = \E{G) I be the total number of edges in G. Unless stated otherwise, we assume 
that G contains no multiple edges. For each vertex v G V let E{v) denote its set of 
neighbors, and let deg(r;) = |U(z;)| denote its degree. The edges incident to v (and 
their end-points, the neighbors of v), are labelled from 1 to deg(i;). Note that each edge 
has two, possibly different, labels, one with respect to each of its end-points. We hence 
view edges as quadruples. That is, if there is an edge between v and u, and it is the 
i-th edge incident to v and the j-th edge incident to u, then this edge is denoted by 
{u,v,i,j)- When we want to distinguish between the quadruple {u,v,i,j) and the pair 
{u, v) then we refer to the latter as an edge-pair. We let dmax = dmax(G) denote the 
maximum degree in the graph G and davg = davg(G) denote the average degree in the 
graph (that is, davg(G) = 2m{G)/n). 
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Distance to having a property. Consider a fixed graph property V. For a given graph 
G, let e'p(G) be the minimum number of edges that should be added to G or removed 
from G so that it obtain property V. The distance of G to having property V is defined 
as e-p (G) /rn{G) . In particular, we say that graph G is e-far from having the property V 
for a given distance parameter 0 < e < 1, if ep{G) > e • m{G). Otherwise, it is e-close 
to having property V. In some cases we may define the distance to having a property 
with respect to an upper bound rrimax > m{G) on the number of edges in the graph 
(that is, the distance to having property V is defined as ep{G) /rrimax)- For example, if 
the graph is dense, so that m{G) = i7(n^) then we set mmax = and alternatively, if 
the graph has some bounded degree d, then we set mmax = d ■ n. (In the latter case we 
could set mmax = {d ■ n) /2, but for simplicity we set the slightly higher upper bound.) 
If ep{G) /rrimax > e then we shall say that the graph is e-far from property V with 
respect to mmax- 

Testing algorithms. A testing algorithm for a graph property V is required to accept 
with probability at least 2/3 every graph that has property V and to reject with proba- 
bility at least 2/3 every graph that is e-far from having property V, where e is a given 
distance parameter. If the algorithm always accepts graphs that have the property then 
it is a one-sided error algorithm. The testing algorithm is given the number of vertices 
in the graph, the number of edges in the graph, or an upper bound on this number, and 
it is provided with query access to the graph. Specifically we allow the algorithm the 
following types of queries. 

• The first type of queries are degree queries. That is, for any vertex u of its choice, 
the algorithm can obtain deg(u). We assume that a degree query has cost one. In 
fact it can be easily implemented using neighbor queries with cost 0(log dmax) = 
O(logn). 

• The second type of queries are neighbor queries. Namely, for every vertex u and 
index 1 < f < deg(tt), the algorithm may obtain the f-th neighbor of vertex u. 

• The third type of queries are vertex-pair queries. Namely, for any pair of vertices 
(u, v), the algorithm can query whether there is an edge between u and v in G. 



Bipartiteness. In this work we focus on the property of bipartiteness. Let (Vi, V 2 ) be 
a partition of V. We say that an edge (u, v) € E is a violating edge with respect to 
(Vi, V 2 ), if u and v belong to the same subset Vb, (for some b G {1,2}). A graph is 
bipartite if there exists a partition of its vertices with respect to which there are no 
violating edges. By definition, a graph is e-far from bipartite if for every partition of its 
vertices, the number of violating edges with respect to the partition is greater than e • m. 
Recall that a graph is bipartite if and only if it contains no odd-length cycles. 



3 The Algorithm for the Almost-Regular Case 

In this section we describe an algorithm that accepts every bipartite graph and that 
rejects with probability at least 2 /3 every graph that is e-far from bipartite with respect 
to an upper bound mmax = dmax^r on the number of edges. Namely, this algorithm 



Tight Bounds for Testing Bipartiteness in General Graphs 



347 



rejects (with probability at least 2/3) graphs for which the number of edges that need 
to be removed so that they become bipartite is greater than e • rrimax = e • dmaxn. 
The query complexity (and running time) of this algorithm is 0(min(y^, n/dmax) • 
poly(logn/e)). 

In the case where the graph is almost-regular, that is, the maximum degree of the 
graph dmax is of the same order as the average degree, davg> then we essentially obtain 
a tester as desired (since in such a case edmaxn = 0{em)). However, in general, dmax 
may be much larger davg (for example, it is possible that dmax = 0{n) while davg = 
0(1)). To deal with the general case we show in the next section (Section 4) how to 
reduce the problem in the general case to the special case of dmax = O(davg)- 

A High Level Description of the Algorithm. Throughout this section let d = dmax- 
Our algorithm builds on the testing algorithm for bipartiteness described in [9] whose 
query complexity is 0{^/n ■ poly(log n/e)) (and which works with respect to mmax = 
dn as well). In fact, as long as d < ^/n our algorithm is equivalent to the algorithm 
in [9]. In particular, as in [9], our algorithm selects 0(l/e) starting vertices and from 
each it performs several random walks (using neighbor queries), each walk of length 
poly (log n/e). If d < ^/n then the number of these walks is 0(yTi- poly (log n/e)), and 
the algorithm simply checks whether an odd-length cycle was detected in the course of 
these random walks (possibly relying on information from more than one random walk 
to find an odd cycle). 

If d > ^/n then there are two important modihcations: (1) The number of random 
walks performed from each vertex is reduced to 0(i/n/d- poly (log n/e)); (2) For each 
pair of end vertices reached in these walks with the same parity, the algorithm performs 
a vertex-pair query. Similarly to the d < ^/n case, the graph is rejected if an odd-length 
cycle is found in the subgraph induced by all queries performed. Pseudo-code for the 
algorithm is shown in Figure I . 

Random Walks and Paths in the Graph. The random walks performed are defined as 
follows: At each step, if the degree of the current vertex u is d' < d, then the walk 
remains at v with probability 1 — ^ > 5 , and for each u G T(u) , the walk traverses to 
u with probability ^ . The important property of the random walk is that the stationary 
distribution it induces over the vertices is uniform. 

For every walk (or, more generally, for any sequence of steps), there corresponds a 
path in the graph. The path is determined by those steps in which an edge is traversed 
(while ignoring all steps in which the walk stays at the same vertex). Such a path is 
not necessarily simple, but does not contain self loops. Note that when referring to the 
length of a walk, we mean the total number of steps taken, including steps in which the 
walk remains at the current vertex, while the length of the corresponding path does not 
include these steps. 

Theorem 1 The algorithm Test-Bipartite-Reg accepts every graph that is bipartite, and 
rejects with probability at least 2 /3 every graph that is e-farfrom bipartite with respect 
to mmax = daxa^n. Furthermore, whenever the algorithm rejects a graph it outputs 
a certificate to the non-bipartiteness of the graph in form of an odd-length cycle of 
length poly(log n/e). The query complexity and running time of the algorithm are 
O (min(i/n,n/dmax) ■ poly (log n/e)). 
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Test-Bipartite-Reg(n, dmax, e) 

• Repeat T = 0(|) times: 

1. Uniformly select s in V. 

2. If Odd-Cycle(s) returns found then output reject. 

• In case no call to Odd-Cycle returned found then output accept. 

Odd-Cycle(s) 

1. If d = dmax < \/n then let K and L ). Otherwise 

(d > let K = 0 ^ and L = 0 . 

2. Perform K random walks starting from s, each of length L. 

3. Let ^0 (^i) be the set of vertices that appear on the ends of the K walks whose paths are 
of even (odd) length. 

4. If d < ^/n then check whether Aq n 7 ^ 0. If the intersection is non-empty then return 
found, otherwise return nof-found. 

5. Else (d > y^), perform vertex-pair queries between every pair of vertices u,v G Ao 
{u, V G Ai). If an edge is detected then return found, otherwise return not-found. 



Fig. 1. Algorithm Test-Bipartite-Reg for testing bipartiteness with respect to the upper bound 
rrimax = dmax ' Ti on the number of edges, and the procedure Odd-Cycle for detecting odd-length 
cycles in the graph G. 



Note that the algorithm can work when G contains self-loops and multiple-edges. The 
latter will he of importance in the next section. The corollary helow will become useful 
in the next section as well. 

Corollary 2 If G is e- far from bipartite with respect to mmax = dmaxti, then f?(e)- 
fraction of its vertices s are such that Odd-Cycle(s) returns found with probability at 
least |. 

Since the proof of Theorem 1 has similar structure to the proof given in [9], we 
omit it from this extended abstract. All details of this proof, as well as other proofs, can 
be found in the full version of this paper [12]. 



4 The Algorithm for the General Case 

In this section we build on the testing algorithm presented in the previous section and 
show a one-sided error bipartite testing algorithm that works with respect to the actual 
number of edges m = m{G). Hence this algorithm is suitable for general graphs (for 
which dmax may vary significantly from davg). The query complexity and running time 
of the algorithm are of the same order of magnitude as for Test-Bipartite-Reg, that is, 
0(min(y/n, r? jm) ■ poly(log n/e)). We note that once the graph becomes very dense, 
that is m = fl{r? j\o^n) (where c is approximately 4), it is preferable to use the 
adjacency-matrix model algorithm [8, 3] with distance parameter e/(n^/m). 
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A High Level Description of the Algorithm. The basic idea is to reduce the problem 
of testing with respect to the actual number of edges m to the problem of testing with 
respect to the upper bound mmax = <^max ’ti- Specifically, for any graph G we show how 
to define a graph G' over 6>(n) vertices that has the following useful properties. First, 
the maximum degree in G' is roughly the same as the average degree, and furthermore, 
this degree is roughly the same as the average degree in G. In particular this implies 
that the two graphs have roughly the same number of edges. Second, G' approximately 
preserves the distance of G to bipartiteness. More precisely, if G is bipartite then so is 
G' , but if G is far from bipartite with respect to m{G), then G' is far from bipartite 
with respect to m^nax = dma,x{G')n' . Thus G' can be viewed as a kind of “regularized- 
degree version” of G. 

If we had direct access to G", then by the above we would be done: by running the 
algorithm Test-Bipartite-Reg on G' we could decide whether G is bipartite or far from 
bipartite. However, we only have access to G. Nonetheless, given query access to G 
we can efficiently “emulate” queries in G' . This would almost suffice for running Test- 
Bipartite-Reg on G' . One more issue is the uniform selection of starting vertices in G', 
required by Test-Bipartite-Reg. As we shall see, selecting a vertex uniformly from G' 
is (roughly) equivalent to uniformly selecting an edge in G. We shall approximate the 
latter process. 

In what follows we assume that m > n' and that there are no multiple edges (where 
we can actually deal with the case in which there are multiple edges but they do not 
constitute more than a constant fraction of the total number of edges). 

The main theorem of this subsection follows. 

Theorem 3 For every graph G having n vertices and m > n edges, we can define a 
graph G' having n! vertices and m! edges for which the following holds: 

1. n < n' < 3n, m <m! < 6m, and dmax(G') < 2davg(G). 

2. If G is bipartite then G' is bipartite, and if G is e-far from bipartite with respect 
to m, then G' is e' -far from bipartite with respect to mmax(G') = dmax{G')n' for 
e' = 0{e). 

3. Given a starting vertices s in G' , it is possible to emulate random walks in G' start- 
ing from s, by performing queries to G. The amortized cost of each random walk 
step is 0(log^ n) (degree and neighbor) queries in G. By emulating these ran- 
dom walks it is possible to execute a slight variant of Odd-Cycle(s) in G' which 
we denote Odd-Cycle’ (s). This variant is such that Pr[Odd-Cycle’(s)=found] > 
Pr[Odd-Cycle(s)=found], where if Odd-Cycle’(s) returns found, then we can ob- 
tain an odd-length cycle of length poly(log n/e) in the original graph G. 

4. There exists a procedure Sample- Vertices- Almost-Uniformly-in-G’ that for any 
given parameter 0 < <5 < 1, performs 0(min(y^n/5, r? jmf) queries in G and 
returns a vertex in G' such that the following holds: For all but at most 5n' of the 
vertices x in G', the probability that x is selected by the procedure is 17(1 /n'). 

We note that for every graph G there is actually & family of graphs G' with the above 
properties (all defined over the same set of vertices). When we run algorithm Test- 
Bipartite-Gen, we construct one such (arbitrary) graph G' in the family as we go along. 
As a corollary to Theorem 3 and Corollary 2 we obtain: 
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Corollary 4 Algorithm Test-Bipartite-Gen (see Figure 2) accepts every graph G that 
is bipartite, and rejects with probability at least 2/3 every graph G that is e-far from 
bipartite (with respect to m{G) ). Furthermore, whenever the algorithm rejects a graph 
it outputs a certificate to the non-bipartiteness of the graph G inform of an odd-length 
cycle of length poly(log n/e). 

The query complexity and running time of the algorithm are 
O (min{yTn, nffm) • poly (log n/e)). 



Test-Bipartite-Gen(n, davg, e) 

• Repeat T = 0(|) times: 

1. Sete' = e/108. 

2. Select a vertex s in G' by calling the procedure Sample- Vertices-Almost-Uniformly-in- 
G’ with 5 = e /c (where c is a sufficiently large constant). 

3. Apply Odd-Cycle’ (s). 

4. If Odd-Cycle’(s) returns found then output reject. 

• In case no call to Odd-Cycle’ returned found then output accept. 



Fig. 2. Algorithm Test-Bipartite-Gen for testing bipartiteness with respect to the actual number 
of edges m = m(G) in the graph G. 



4.1 Defining G' and Proving the First Item in Theorem 3 

In all that follows, let d = davg(G'), and let d' = dmax{G'). We shall assume that d 
is a sufficiently large constant. If davg(G') is not sufficiently large then we still set d 
in the construction below to be sufficiently large, and run the algorithm with e set to 

e/(d/davg(G)). 

The Construction of G' . For each vertex u in G such that deg(u) < d, we have a single 
vertex in G'. For each vertex u in G such that deg(u) > d we have in G' a subgraph, 
denoted H {v ) . It is a bipartite graph over two subsets of vertices, one denoted X(u) , the 
external part, and one denoted /(?;), the internal part. Both parts consist of [deg(u)/d] 
vertices. Every vertex in X{v) represents up to d specific neighbors of v according to 
some fixed, but arbitrary partition of the neighbors of v. We refer to the vertices in 
the two subsets by and , respectively. The edges in 

H{v) are determined as follows. In case deg(u)/d < d then we have [d^/deg(u)] - 
multiple edges between every internal vertex and every external vertex in H {v) . In case 
deg(u) /d > d, denote s = [deg(t;) / d] and let H (v) be a bipartite expander where each 
of its sides has s vertices (s > d). Each vertex in Id(v) has degree d. All eigenvalues of 
the adjacency matrix of H, but the largest one and the smallest one (which are equal to d 
and — d, respectively), are at most d/4 in their absolute values. Explicit constructions of 
such expanders can be found, e.g., in [14, 13]. Eurthermore, these constructions allow 
the determination of the i-th neighbor of any given vertex in constant time. 
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We have described how vertices of G are transformed into vertices of G' . It re- 
mains to describe the relevant transformation to the edges of G. Consider an edge 
(u,v) G E{G) where v is the i-th neighbor of u and u is the j-th neighbor of 
V. Let Xk{u) and Xe(v) be the external vertices representing the z-th neighbor of 
u, and the j-th neighbor of v, respectively. Then, there is an edge (Xk{u), Xi{v)) 
in G". It directly follows that every vertex in G' has degree at most 2d and that 
n' = |C(G")| < ^rdeg(f)/d] < 3n, and m' = m{G') < 3dn = 6m. 

In the long version of this paper [12] we suggest the following alternative prob- 
abilistic construction of G" that establishes Theorem 3. Every vertex of G is trans- 
formed into [deg(u)/(i] vertices. Denote by X(u) the vertices in G" related to a vertex 
V G V{G). The vertices in X{v) are denoted by Xi{v), 1 < i < |"deg(t;)/(i] . Thus, 
n' = |E(G")| < X^^ggT^^^^^I — edges of G' are determined as follows: an 

edge {u, v) G E{G) chooses independently uniformly at random a vertex from X (u) 
and a vertex from X{u).lnG' there will be an edge between these two randomly chosen 
vertices. Clearly, m' = |i5(G")| = |E(G')| = {nd)/2. 

The probabilistic construction is simpler and more robust than the deterministic 
one, and it may be applicable to other problems as well. However in this construction 
we need that d= f?(l/e). 

4.2 Establishing Items 2 and 3 in Theorem 3 

The proofs of these two items are ommitted from this extended abstract, and can be 
found in [12]. We note that Item 2 builds on the expander graphs defined in the con- 
struction of G". 

4.3 Establishing Item 4 in Theorem 3 

In this subsection we provide a sketch for the proof of the last item in Theorem 3. 
Consider the construction of G". Sampling a vertex uniformly at random from G" is 
equivalent to sampling a vertex v from G with probability proportional to its degree 
(and then taking randomly and uniformly one of the vertices belong to H{v)). The 
latter is equivalent to sampling randomly uniformly an edge from G, and taking one of 
its end-points at random. Thus, the proof of this item is based on a presentation of a 
procedure for sampling edges almost uniformly from G. 

We consider two cases: d > '/Sn and d < '/Sn (recall that d = davg(G') is the 
average degree in G and that our goal is to use 0(min{y/n/6, n/d)) queries to G). The 
first case is easy since if G contains sufficiently many edges then we simply sample 
0{n/d) = 0{n?/m) pairs of vertices in order to obtain an edge. 

In the second case, where G contains fewer edges (d < \ 5n), we do not have an al- 
gorithm that selects an edge uniformly from G (using relatively few queries). However, 
we can show the following lemma, from which Item 4 in Theorem 3 can be derived. 
The proof of this lemma can be found in [12]. 

Lemma 1 There exists a procedure Sample-Edges-almost-Uniformly-in-G that uses 
0{'\Jn/d) degree and neighbor queries in G and for which the following holds: For all 
but (<5/4)m of the edges e in G, the probability that the procedure outputs e is at least 



352 



Tali Kaufman, Michael Krivelevich, and Dana Ron 



l/(64m). Furthermore, there exists a subset C/p C V{G), |C/o| ^ (Sn/2), such that 
for all edges e = (u, v) that are output with probability less than 1/ (64m), we have 
u,v G Uq. 



5 A Lower Bound 

In this section we present a lower bound on the number of queries necessary for testing 
bipartiteness. Similarly to the lower bound presented in [9], this lower bound holds 
for testing algorithms that are allowed a two-sided error, and the graphs used for the 
lower bound construction are regular graphs. However, the lower bound of (for 

constant e) established in [9], holds for graphs having constant degree (e.g., degree 3), 
and when the algorithm is allowed only neighbor queries. Our lower bound is more 
general in that it allows the algorithm to perform both neighbor queries and vertex-pair 
queries, and it is applicable to all graphs. 

Theorem 5 Every algorithm for testing bipartiteness with distance parameter e < 2~'^ 
must perform f?(min(y^, n^/m)) queries. 

The high-level structure of our proof is similar to other lower-bound proofs for test- 
ing, which can be traced back to [17]. We present two distributions over graphs, where 
all graphs generated by one distribution are bipratite (and hence should be accepted), 
while with very high probability a graph generated according to the other distribution 
is far from bipartite. We then show that any algorithm with query complexity below the 
lower bound, cannot distinguish between the two distributions (and hence must have a 
large failure probability). 

Specifically, both distributions, denoted Q(n,d), and Q(n/2,n/2, d), are over d- 
regular graphs having n vertices, where we assume for simplicity that n is even. A 
graph generated according to G{n, d) is obtained by selecting, uniformly and indepen- 
dently, d perfect matchings between the n vertices. A graph generated according to 
f/(n/2, n/2, d) is obtained by first randomly partitioning the n vertices into two equal 
parts, and then selecting, uniformly and independently, d perfect matchings between the 
two parts. By definition, all graphs in the support of Q(nj2, n/2, d) are bipartite, and 
we prove that graphs generated according to G{n, d) are e-far from bipartite with high 
probability, for e < 1/16 and d > 64 

We then show that the following two claims hold when a graph is generated ei- 
ther according to Q{n, d) or according to Q{nj2, n/2, d): (1) Any algorithm that asks 
o(v? jrn) = oinjtf) queries, will not detect an edge by any vertex-pair query with 
very high probability. (2) Any algorithm that asks o(y^) queries will not receive as 
an answer to any neighbor query, a vertex it has already observed in a previous query 
(with very high probability as well). From this we can conclude that any algorithm 
that asks o{min{y/n, nf /m)) queries cannot distinguish between the two distributions, 
as desired. In the lower bound proof we show the necessity of both neighbor queries 
and vertex-pair queries. Specifically by using only one type of queries the lower bound 
increases. 
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Abstract. This paper addresses the question: what processes take polynomial 
time on a quantum computer that require exponential time classically? We show 
that the hitting time of the discrete time quantum random walk on the «-bit hy- 
percube from one comer to its opposite is polynomial in n. This gives the first 
exponential quantum-classical gap in the hitting time of discrete quantum walks. 
We provide the basic framework for quantum hitting time and give two alternative 
definitions to set the ground for its study on general graphs. We outline a possible 
application to sequential packet routing. 



1 Introduction 

Random walks form one of the cornerstones of theoretical computer science as well 
as the basis of a broad variety of applications in mathematics, physics and the natural 
sciences. In computer science they are frequently used in the design and analysis of 
randomized algorithms. Markov chain simulations provide a paradigm for exploring an 
exponentially large set of combinatorial structures (such as assignments to a Boolean 
formula or matchings in a graph) by a sequence of simple, local transitions. As algorith- 
mic tools they have been applied to a variety of central problems, such as approximating 
the permanent [JS89], finding satisfying assignments for Boolean formulas [Sch99] and 
the estimation of the volume of a convex body [DFK91]. Other well-known examples of 
algorithms based on random walks include 2-SAT, Graph Connectivity and probability 
amplification [MR95, Pap94]. 

Recently the study of quantum walks has been initiated, with the hope of bring- 
ing new powerful algorithmic tools into the setting of quantum computing. To this day 
nearly all efficient quantum algorithms are based on the Quantum Fourier Transform 
(QFT), like Simon’s period-finding algorithm [Sim97] or Shor’s celebrated algorithms 
for Factoring and Discrete Log [Sho97]. However, it seems that the power of the QFT 
might be limited to solve similar problems on non- Abelian groups, like for the symmet- 
ric group for Graph Isomorphism [HRTOO, GS+01]. It seems crucial to develop new 
algorithmic tools. 

Several striking differences between classical and quantum discrete walks have al- 
ready been observed for walks on the cycle [AA+01], the line [AB+01] and the hy- 
percube [MR02]. The reason for this is quantum interference. Whereas there cannot 
be destructive interference in a classical random walk, in a quantum walk two separate 

s. Arora et al. (Eds.): APPROX 2003+RANDOM 2003, LNCS 2764, pp. 354-369, 2003. 

(c) Springer- Verlag Berlin Heidelberg 2003 



Discrete Quantum Walks Hit Exponentially Faster 355 



paths leading to the same point may be out of phase and cancel out. The focus of previ- 
ous work has been primarily on the mixing time of a discrete quantum walk. It has been 
shown that quantum walks on a large class of graphs can mix nearly quadratically faster 
than their classical counterparts. Since mixing times are an important quantity for many 
classical algorithms, this has raised the question of whether quantum walks can mix 
exponentially faster. However in [AA+01] a lower bound on the mixing time of any lo- 
cal quantum walk has been obtained, which implies in essence that quantum walks can 
mix at most quadratically faster than classical walks (this is exactly true for bounded 
degree graphs; for graphs of maximal degree d this speed-up may be enhanced by a fac- 
tor of 1 /d). This result showed that in all likelihood quantum walks cannot drastically 
enhance mixing times of classical walks. 

In this paper we set the stage to exactly analyze another crucial quantity of dis- 
crete time walks: the hitting time. The hitting time is important in many algorithmic 
applications of classical random walks, like k-SAT or Graph Connectivity. For instance 
the most efficient known solution to 3-SAT is based on the hitting time of a random 
walk [Sch99]. In the algorithmic context, the question whether a quantum process can 
achieve an exponentially faster penetration of graphs has first been raised by Farhi 
and Gutmann [FG98]. For the continuous time quantum walk, a different model from 
the one we analyze, Farhi et al. gave a mixture of analytical and numerical evidence 
of an exponential gap in hitting behavior [FG98, CFG02]. After our work has been 
completed very recently Childs et al. succeeded to give an oracle-based algorithmic 
exponential speed-up between classical and quantum query complexity based on the 
quantum continuous-time walk [CC+02]. They are able to construct a family of ran- 
dom graphs with two special nodes such that on average any classical algorithm that 
needs to hnd the sink node starting form the the source node requires an exponential 
number of queries, whereas the quantum algorithm succeeds in polynomial time. The 
continuous-time quantum walk at the base of that example is different from the discrete 
time model we analyze and it is a priori not clear how both models are related. Even 
though their beautiful result proves a rigorous separation between the classical and the 
quantum setting, the wider applicability of their example is questionable at the moment. 
It is important to rigorously establish the notions and methods for hitting behaviour of 
quantum walks, in particular in the discrete case, and to analyze it for other graphs and 
structures. Our work provides a step in this direction. 

The hitting time huv of node v starting from node u measures the expected time it 
takes until the walk hits v for the first time. In the quantum case we face a dilemma: as 
is well known, observations of the quantum system (like “Has the walk hit node v?”) 
influence the state of the quantum system. In particular if one were to observe the posi- 
tion of the quantum walk at each time it would lose its quantum coherence and reduce 
(“collapse”) to the standard classical random walk, in which case we cannot expect any 
non-classical behavior or speed-ups. We give two alternatives out of this dilemma and 
establish two different notions of “quantum hitting time”. In the first case the walk is 
not observed at all. Started at node u the position of the walk is measured at a (previ- 
ously determined) time T . If the probability p to be at node v at time T is sufficiently 
large (an inverse polynomial in the graph size) we call T a “one-shot p hitting time”. In 
the second case (“concurrent measurement”) we do not require any previous knowledge 
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of when to measure the position of the walk. Starting from node u at every step of the 
walk a partial measurement is performed (only the question “Is the position v or not v?” 
is asked). If the walk is found to have hit node v, it is stopped, otherwise the next step 
follows. This measurement perturbs the walk slightly but does not kill all the quantum 
coherence at once. If after a time T the probability p to halt is bounded below by an 
inverse polynomial in the size of the graph, we call T a “concurrent p hitting time”. 

After having made these notions rigorous we are able to show that on the hypercube 
both definitions of quantum hitting time lead to polynomial quantities for the walk from 
one corner to the opposite corner. This is in stark contrast to the classical case, where the 
corner-to-corner hitting time is exponential. Our result provides the first fully analytical 
classical-quantum exponential gap for a discrete quantum walk on a graph. It opened 
the possibility that quantum algorithms based on random walks may significantly im- 
prove upon classical algorithms. We will state similar results for the continuous-time 
quantum walk and also outline a possible application of rapid hitting on the hypercube: 
“quantum-random” sequential routing in a network. 

It is interesting to know how much the exponential speed-up of the quantum walk 
depends on the choice of initial and final position. We establish two bounds: a lower 
bound on the size of the neighborhood of one corner from which we still achieve poly- 
nomial hitting behavior to the opposite corner and an upper bound on this neighbor- 
hood. This latter derives from a lower bound on quantum unstructured search algorithms 
[BB+97]. 

While quantum walks are very easy to describe, they appear to be quite difficult to 
analyze. Standard techniques for analyzing classical random walks are apparently of 
little use. Whereas in the classical case most quantities depend only on the gap between 
the first and second largest eigenvalue of the underlying chain, in the quantum case 
all eigenvalues seem to play an equally important role and new methods are needed. 
We hope that establishing the rigorous notions and necessary techniques will help to 
analyze quantum walks on a variety of graphs. 

Related Work: Various quantum walk variants have previously been studied by several 
authors. The general framework for discrete quantum walks is introduced in 
[Mey96, WatOl, AA+01, AB+01] . The mixing time of the quantum random walk on 
the hypercube has been analysed in [MR02], both in the discrete and continuous time 
setting. We use the spectral decomposition of [MR02] in our analysis. However, the 
results in [MR02] regard only the mixing time of the walk and do not deal with hitting 
times. In [AB+01] a notion of “halting” and intermediate partial measurement similar to 
our concurrent measurement is used, but the results regard the total halting probability 
of the quantum walk, and not the expected hitting time. Numerical studies of the hitting 
time on the hypercube have been communicated to us by Tomohiro Yamasaki [YamOl] 
(published in [YKI02] after our work has been completed). A quantum search algorithm 
based on the discrete walk on the hypercube has recently been found [SKW03]. 

A different model of quantum random walks, so called continuous time walks, has 
been introduced by Farhi and Gutmann [FG98]. They are defined via a Hamiltonian 
that stems from the generating matrix of the classical continuous random walk. Until 
now it is not clear how their model is related to the discrete case we analyze. For their 
random walk model Farhi and Gutmann first exhibited an infinite tree and a walk that 
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hits a set of leaves with inverse polynomial probability in polynomial time (similar 
to our notion of “one-shot hitting time”), where the classical analog has exponential 
hitting time. Later in [CFG02] another finite graph with a similar property is presented; 
both proofs are partly analytic and partly numeric, however. After the completion of the 
present work Childs et al. [CC+02] where able to construct a family of graphs based on 
the one in [CFG02] and to show that the continuous-time random walk gives rise to an 
exponential algorithmic speed-up between average case classical query complexity and 
its quantum version for the problem to find a very specific node in this graph. 

Structure of the paper: We begin by reviewing in Sec. 2 the necessary background 
on classical random walks, quantum computation and quantum discrete time walks on 
graphs and in particular on the hypercube. In Sec. 3 we introduce the relevant definitions 
of quantum hitting times, and state and prove the upper bounds on quantum hitting 
times on the hypercube. In Sec. 4 we provide upper and lower bounds on the size of the 
neighborhood of a node from which the quantum random walk has polynomial hitting 
behavior to the opposite corner. In Sec. 5 we outline a quantum routing application. In 
Appendix A we compare continuous-time random walks to discrete walks and establish 
analogous results for their hitting time. 



2 Background 

2.1 Random Walks 

Here we will state a few specific definitions and theorems as they are relevant to the 
present work to compare the behavior of classical and quantum walks (for a more com- 
plete treatment see e.g. [MR95, AFOl]). 

Simple Random Walk: A simple random walk on an undirected graph G{V,E), is de- 
scribed by repeated applications of a stochastic matrix P, where v = if i® an 
edge in G and the degree of u. If G is connected and non-bipartite, then the distri- 
bution of the random walk, Z)' = P'D^ converges to a stationary distribution Jt which is 
independent of the initial distribution Z)*'. If a simple random walk on a bipartite graph 
has some periodicity (there is a state i and an initial distribution such that D\> 0 iff 
t belongs to the arithmetic progression {a + ms\m > 0} for some integer a) the intro- 
duction of a resting probability will make the walk aperiodic and convergent to jt. For 
r/— regular graphs G (all nodes of same degree d), the limiting probability distribution 
is uniform over the nodes of the graph. 

Hitting Time: Given an initial state /, the probability that the first transition into a state 
j occurs at time t is denoted by P-j. The hitting time hij is the expected number of steps 
to reach state j starting from state i and is given by hij = - For aperiodic simple 

random walks the Fundamental Theorem of Markov Chains implies that the number of 
times a state i is visited in the stationary state is 1 / 7t,- and ha = 1 /tt,-. 

Hypercube: The stationary distribution of the simple aperiodic random walk on the n- 
bit hypercube is given by Ji, = 1 /2". The hitting time from one node i to the opposite 
corner of the cube j is exponential in n, hij = 2"( 1 -f f + q^))- 
Continuous time walk: The theory of continuous time Markov chain closely parallels 
discrete time chains. A continuous chain is specified by non-negative transition rates 
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qij. Given that the state of the system at time t is Z, = i, the probahility that X,+d, = j 
is qijdt. One can define qu = — to obtain a matrix Q. The state of the system 

with initial state is then given by ZJ' = exp{Qt)D^ . All the results on convergence 
and hitting essentially carry over to the continuous case with only slight modifications. 
To transition form discrete to continuous one can “discretize” a continuous chain by 
setting P = exp{Q) or make a discrete chain continuous by setting qij = ptj for i ^ j. 
Stationary distribution and mean hitting times remain unchanged. 

2.2 Quantum Computation 

The model. Consider a finite Hilbert space with an orthonormal set of basis states 
|^) for s gQ.. The states s G Q. may be interpreted as the possible classical states of the 
system described by In general, the state of the system, |a), is a unit vector in the 
Hilbert space 9{, and can be written as |a) = where = 1- |oi*) 

denotes the conjugate and (a| denotes the conjugate transpose of |a) . (p|a) denotes the 
inner product of | a) and | p) . For more details on quantum computing see e.g. [NCOO] . A 
quantum system can undergo two basic operations; unitary evolution and measurement. 
Unitary evolution: Quantum physics requires that the evolution of quantum states is 
unitary, that is the state |a) is mapped to f/|a), where U satisfies U -U^ = I, and 
denotes the transpose complex conjugate off/. Unitary transformations preserve norms, 
can be diagonalized with an orthonormal set of eigenvectors, and the corresponding 
eigenvalues are all of absolute value 1 . 

Measurement: We will describe here only projective (von Neuman) measurements, de- 
fined by a set of orthogonal projectors {H,- :iGl} (Hj = H,-, H? = H,- and n/Hy = 5,yn,) 
such that X;G/n, = 1. The output of the measurement of the state |a) is an element 
i G I with probability ||n,ja)|p, we then say that H,- was measured. Moreover, the 
new state of the system after the measurement with outcome i is the (normalized) state 
(||n,ja)||)^'n,ja). We denote the projectors on one basis state |v) by |v)(v|. 
Combining two quantum systems: If 9{a and 9-(b are the Hilbert spaces of two systems, 
A and B, then the joint system is described by the tensor product of the Hilbert spaces, 
9-(a If the basis states for 9Pa, are {|'^)}){k)}> respectively, then the basis 

states of 9 {a® 9-Cb are {|a) 0 |v)}. We use the abbreviated notation \a,v) for the state 
\a) ® |v). This coincides with the interpretation by which the set of basis states of the 
combined system is spanned by all possible classical configurations of the two 
classical systems A and B. 



2.3 Discrete-Time Quantum Random Walk 

It is not possible to define the quantum random walk naively in analogy to the classical 
walk as a move in all directions “in superposition”. It is easy to verify [Mey96] that a 
translationally invariant walk which preserves unitarity is necessarily proportional to a 
translation in one direction. If the particle has an extra degree of freedom that assists 
in its motion, however, then it is possible to define more interesting homogeneous local 
unitary processes. Following [AA+01] we call the extra space the “coin-space” alluding 
to the classical coin that decides upon the walk direction. 
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More specifically let G( V, £) be a graph, and let 9{v be the Hilbert space spanned by 
states |v) where v e V. We denote by A^, or |y | the number of vertices in G. We will only 
consider rf-regular graphs G here, but slightly modified definitions can be made in the 
general case. Let y-lc be the “coin”-Hilbert space of dimension d spanned by the states 
|1) through \d). Let C be a unitary transformation on 9-(c (the “coin-tossing operator” 
which we will define later). Label each directed edge with a number between 1 and d, 
such that for each a, the directed edges labeled a form a permutation. For Cayley graphs 
the labeling of a directed edge is simply the generator associated with the edge. Now 
we can define a shift operator S on such that S|a,v) = \a,u) where u is the 

a-th neighbor of v. Note that since the edge labeling is a permutation, S is unitary. One 
step of the quantum walk is given by a local transformation acting on the coin- space 
only, followed by a conditional shift which leaves the coin-space unchanged [AA+01]: 
U = S-(C(8 )In). 

Random Walk on the Hypercube: The hypercube of dimension n is a Cayley 
graph with N =2" vertices. The position states are bit-strings \x) of length n. The direc- 
tions can be labeled by the n basis- vectors {| 1), . . . , |n)}, corresponding to the n vectors 
of Hamming weight 1 { |ei ),..., |e„)}, where e, has a 1 in the ith position. 

To mimic the permutation symmetry of the classical simple random walk we need 
to define the nx n coin operator C such that U is invariant to permutations of bits. As 
pointed out in [MR02] the symmetry of the hypercube defines the coin operator C to be 
of the form Q/ = a if i = j and Cij = bifi^ j with two parameters a,bGC. Unitarity of 
C further imposes two quadratic constraints on a and b, so that finally up to an overall 
phase all symmetric coins are characterized by one real parameter 1 — 2/n < \a\ < 1. 
Among all these coins the one farthest away from the identity operator 1„ is given by 
a = 2/n—l and /? = 2/n [MR02] . We will call this latter coin G and use it as our coin in 
the rest of this paper. It is not hard to see that using another coin (with constant a, b) from 
the set of permutation invariant coins (except 1„ of course) only slows down the walk 
by a constant factor and does not change the order of magnitude of the hitting behavior. 
To respect symmetry we will also impose permutation invariance for the initial state of 
the walk. 

Definition 1 (Discrete time walk on the hypercube). The symmetric discrete time 
walk U on the n - dimensional hypercube is acting on a n-2’' dimensional space tHn ® 
fli U = S • (G® In) where the shift operator S is defined as S : \i,x) ^ |f,jr0 e,), 
i.e. S = X)Li 10 (0 ® Si with 5,|x) = |x0 e,) . The initial state of the walk is chosen to be 
symmetric with respect to bit-permutations. For a walk starting in |x) the initial state is 

W- 

Note that this discrete-time quantum walk reduces to the classical symmetric walk if 
we perform a measurement in the coin-space in the direction-basis after every step of 
the walk. The resulting classical walk with last step in direction i will uniformly change 
to one of the n — \ directions j i with probability \b\^ = A/n^ and will return back 
to the node it came from (direction i) with probability laf = 1 — 4/n0 4/«^. This type 
of classical random walk has a “direction-memory” one step back in time, but can be 
modeled by a (memoryless) Markov chain if we add a directional space to the position 
space. In other words each node v is blown up into n nodes v,- where i is the direction 
the walk came from. This resulting walk has a preference to oscillate back and forth 
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between two adjacent nodes and has obviously still an exponential hitting time from 
one corner to its opposite. 

The walk as defined is periodic: nodes with even Hamming weight are visited at 
even times only, nodes with odd Hamming weight at odd times. The inclusion of a 
“resting” coin-state |0) and a«+l x n+1 coin allowing for a self-loop transition am- 
plitude of a = 2/{n+\) — 1 make this walk aperiodic. To simplify the analysis we will 
only show the results for the periodic case, though; they hold with very slight modifica- 
tion in the aperiodic case as well. 



3 Hitting Times on the Hypercube 

For classical random walks the hitting time of a node v of a walk starting at an initial 
node i is defined as the expected time it takes the walk to reach v for the first time 
starting from i. Alternatively one can let the classical walk stop upon reaching the node 
V and define the stopping-time of the walk as the expected time for this walk to stop. In 
the classical case both notions are clearly the same. Care has to be applied to define an 
analogous notion for a quantum walk. To define “reaching” v we have to circumvent the 
measurement problem. Namely if we were to measure the position of the walk after each 
step we will kill the quantum coherences and collapse the walk onto the corresponding 
classical walk. There are two alternatives; either to let the walk evolve and measure 
the position of the walk after T iterations (“one-shot measurements”), or to perform 
a partial measurement, described by the two projectors Hq = |v)(v| and Hi = 1 — Hq 
(where |v) is some specific position we wish to “hit”) after every step of the iteration 
(“concurrent measurement”). A priori these two notions can be very different in the 
quantum case. 

Definition 2 (One-shot hitting time). A quantum random walk U has a (T,p) one- 
shot (|(|)o), |x)) hitting time if the probability to measure state \x) at time T starting in 
|<l)o) is larger than p, i.e. || (x|[/^|(|)o)|p > p. 



Definition 3 (|x)-stopped walk). A \x)-stopped walk from U starting in state |(|)o) is the 
process defined as the iteration of a measurement with the two projectors Hq = Ilx = 
|x) (x| and Hi = 1 — Hq and, ifYf \ is measured, an application ofU. IfHo is measured 
the process is stopped. 



Definition 4 (Concurrent hitting time). A quantum random walk U has a (T,p) con- 
current (|(|)o), |x)) hitting-time if the \x)-stopped walk from U and initial state |(|)o) has 
a probability > p of stopping at a time t <T. 

These two notions presuppose very different behavior of an algorithm exploiting them. 
In the one-shot case we have to know exactly when to measure the walk, which usually 
means that we have to know the dimension of the hypercube or, in more general appli- 
cations, the shape of the graph. The advantage of the concurrent case is that we do not 
need any knowledge of when the walk will “hit” the target state. We simply continu- 
ously query the walk at the target state until we measure a “hit”; we do not need to have 
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a priori information about the graph; probably ultimately more useful for algorithmic 
applications. 

Note also that in the concurrent case if (T, p) is a hitting-time then for T’>T{T',p) 
is also a hitting-time, i.e. hitting with probability at least is a monotone property in 
time. In the one-shot case this is not at all true; we will see that for the hypercube there 
are certain windows in time where the probability to measure a certain node is high, 
followed by times where this probability is very low - yet another difference to the 
classical case. 



3.1 One-Shot Hitting Time 



We will now state and prove our first main result for the symmetric discrete-time quan- 
tum walk on the hypercube U of dimension n. Times T and t are always understood to 
be the closest integers of the same parity as n. We denote with x the opposite corner to 
X (obtained by conjugating all bits). 



Theorem 1. U has a (T,p) one-shot (|jc), |x:)) hitting time with ( 1 ) T = jn and p = 

1 (2) T = f«±6»(«P) and p = l-O(^) with 0 < (3 < 1/2, (3) T e 

)glogy 
logn 



« - f « + 0{^)] and p=l- 



Remark: The “-y«”-window around the exact one-shot measurement time of 71 m/ 2 makes 
the algorithm more robust to slight perturbations in the exact time of the measurement. 

Proof of Theorem 1: Note that by the symmetry of the hypercube and the walk U 
the hitting time is the same for all (|x), |l)) with x G {0, 1}". So w.l.o.g. we set |r) = 
|00 . . . 0). We will use the following facts from [MR02]: The n ■ 2" eigenstates of U are 
of the form |v/) 0 |^) where \k) = the Z'^-Fourier transform 

for k G ^2 and the n vectors {|v^) : i = 1 . . .«} for each k are the eigenvectors of the 
matrix Sk • G, where Sk is the diagonal nx n matrix with (Sk)/m = 5/m(— 1)^'- 

The symmetric initial state is |<I>o) := l^m) < 8 > |00 . . . 0) := X/=i |i) ® |00 . . . 0) (see 
Def. 1). For all k, only two of the n eigenvectors | v/) have non-zero inner product with 
|T',„) [MR02]. These two eigenvectors are complex conjugates, call them \ wit) and \ wl), 
and their corresponding eigenvalues are and XI with = 1 — ^ -f ij^y/\k\ {n— |^|) 



where |^| is the Hamming weight of k. Let Xk = = cosC 0 n.| + isinC 0 |j(.| where 

coscOm = 1 — 2m/n. The entries of \wk) are iwk)i = ^ i if A:/ = 0 and {wk)i = 

if ^/ = 1 . (If A: = 0 and k = n there is only one eigenvector, the uniform su- 
perposition over all directions, with eigenvalue Xq = 1 and = — 1 . When we write 
out the general eigenvectors this special case will be self-understood.) The initial state 
is a superposition over 2"+' — 2 eigenvectors |Oo) = 'Lke{o.\Y{ak\wk) a\\w\)) ® |^) 
with Ok = '\/« — I^D- Let us denote by |0,) = U'(|T',-„) 0 |00. . .0)) the 

state of the system after t iterations . Note that because both the walk U and its ini- 
tial state preserve the bit-permutation symmetry of the hypercube, the only consistent 
coin-state for position |11 . . . 1) is the completely symmetric state over all directions: 
rX"=i 10 = l^m)- Let us call \f) = |T',„) 0 |11 . . . 1) the “target” state. With these 
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quantities in place, ttr, the amplitude at time t of the particle being in | 11 . . . 1 ), the 
opposite corner, is 

a, : = = (/| 0 ,) = ^ ■ (11 ■ ■ ■ 1|^> 



fe{ 0 ,l}" 



= I 

/IG{ 0 , 1 }" 



1 2ncos{o3kt) (—1)1^1 
\/n • 2"+ * ■s/i^/n 



lri(:)(-ircos(».,4 (1) 



Claim 1. For t ^ [jii — 0{n^) ,^n + 0{n^)] s.t. t — n is even, \ (Xt \ is lower bounded by 
\-0{^)forO<^<\/2. 

Proof of Claim 1: Let us split the sum (1) into two parts, one where the index m G 
M := [(1 — 5)n/2, (1 + 5)n/2] and one where m ^ M, with 5 < 1 specified later. By 
standard Chernoff bounds on the tail probabilities of the binomial distribution we can 
upper-bound the absolute value of all the contributions from M as, 



T. 2 (")(-D"cos(co.,,) 

^ miM 




< 2e 



--T 



( 2 ) 



Let us set 5 = Y with g(n) = Q(logn), in which case (2) is upper bounded by 
2 g-o(iogn )/2 write t = |n ±8 (i.e. 8 = 0(n^)). The second term in the sum 

will come from contributions m G M, so the terms cosco,„ = 1 — 2m /n G [—5,5] will 
be small. Call v„, = | — C 0 „,, so cosco„, = cos(^ — v„,) = v,„ — 0(v^) which means 
V,„ = 1 — 2m/« ± 0(5^). Then 

cos(co,„t) = cos[(^ - Vm)(fM±8)] = cos[( + m)jt =F e(l - ±f<5(5^)] 

= (-l)k'(-l)''<cos[Te(l - ^) ± 0(n5^)] = (-l)'?(-l)'«[l - 0(8252) - 0(«256)] 

(3) 

and the second sum 

^ S (")(-ircos((o„,) = (-l)‘?|l- 0 (e 2 S=)- 0 („W)]l S ("). 

^ m€M \"v ^ mGM V^‘^/ 



Since ^ XmeM (",) > 1 - 2e we have 



|a,| > 



2 " 



m€M 



(-l)'"cos(co,„r) 



-sin) 



S(n) , 






-sin) 



(4) 



Set g{n) = 21ogn to prove the claim for 0 < (3 < 1 /2. | 

To prove Theorem 1 note that the probability of measuring the system in 1 1 1 ... 1 ) 
is p = jarp. Set (3 = 5(1 — ) and use Eq. (4) with g{n) = 21oglogn to get p > 

1 ~ 0 ( *°i o'gl" ) ■ For P = 0 set g{n) = 21 ogn to get a lower bound of 1 — 0{\og^ n/ n) . | 
Remark: Note that if we set T = {2m + \)m\./2 we obtain a similar result to the 
m = 0 case as long as T is sufficiently small so that 0{T^5^) terms do not matter, i.e. 



Discrete Quantum Walks Hit Exponentially Faster 363 



m = 0{n). We can think of the walk returning to 1 1 1 • • ■) every 7 t« steps, which is in 
stark contrast to the classical case where the expected number of times a walk returns 
to some node i is l/tt, = 2” (see Sec. 2.1). Observe that hitting with probability at least 
p is not a monotone property. 

3.2 Concurrent Hitting Time 

Our second result relates to the concurrent version of hitting time for the symmetric 
walk U on the hypercube of dimension n. It implies that even without information on 
when to measure we retain a polynomial hitting behavior; 

Theorem 2. U has a (^w,Q( ^ )) concurrent (|jr), |l)) hitting time. 

Remarks: (1) Amplification: If the probability p in Defs. 2 and 4 is an inverse poly- 
nomial p{n) in the size of the instance, we can use standard classical amplification to 
boost this probability to be exponentially close to 1 . We just restart the random walk 
from scratch and repeat it 0(1/ p{n)) times. So the amplified coined symmetric discrete- 
time quantum walk on the hypercube of dimension n has a (O(n^log^n), 1 — 
concurrent (|x:), |l)) hitting time. 

(2) To be fair we should compare our results to tail-bounds for the hitting time in 
the classical case. It is very easy to show, however, that for the simple random walk 
on the hypercube starting in a node i the probability to hit the opposite corner j in a 
polynomial number of steps is exponentially small since each of the probabilities fi-j to 
be at j at time t (see Sec. 2.1) is exponentially small. 

Proof of Theorem 2: The strategy of the proof is to compare the hitting probabilities 
at time t of the 1 1 1 . . . 1 ) -stopped walk to the unmeasured walk and to show that the 
perturbation caused by the measurement of the walk only gives a polynomial “loss” in 
hitting amplitude. 

For the 1 1 1 . . . 1) -stopped walk (see Def. 3) the same symmetry arguments as before 
apply, since the measurement proj ectors Ho and 11 1 = / — Ho are also symmetric with re- 
spect to bit permutations. So the only possible “target” state is again \ f) = X”=o I*) ® 

1 1 1 ... 1) and we may assume that we measure with {Ho = |/)(/|,rii = 1 — Ho}. Let 
|®o) and a, = (/|d>f) be the same quantities as before for the unmeasured walk 
U. Since the walk has non-zero transition amplitude only between nearest neighbors, 
the first time tt/ 7 ^ 0 is for t = n and since the walk is 2 -periodic a, = 0 whenever t and 
n have different parity. 

Let us define |Of) = (I/ni)'(|T',>,) 0 |00. . .0)) as the non-normalised state we get 
at time t given the walk has not stopped before t and (3, := (/|<I>r). Note that for t <n 
we have | 0 ;) = |<I>;) and at = [3f. 

Claim 2. The probability to stop at some time t <T is given by pr = | (/l®r) |^ = 

I,=olP.P- 

Proof of Claim 2: As in previous work [AB+01] it is easy to see that calculating with the 
renormalized state gives the unconditional probability to stop. If we do not renormalize 
our states we get exactly the conditional probability to stop at time t given we have not 
stopped before. | 
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We now want to relate the tt; from the unmeasured walk to the actual (3r of the 
measured walk. 

Claim 3. X/=o |/") r/nuf [3^^^ = X/=i i 'Y/ with 

lt = {f\U‘\f). 

Proof of Claim 3: By induction on k. We have jO,) = |<I>;) and Ut = (3? for t < n. Fur- 
ther |0„+i) = C/|0„) - f/a„|/) = |0„+i)-[3„C/|/) so [3„+i = (/|0„+i)-a„(/|t/|/) = 
a,i+i - ^n{f\U\f). Write |0„+r.+i) = U\0„+k) ~ ^n+kU\f) and apply the induction hy- 
pothesis to The claim on (3„+jt follows immediately. | 

Claim 4. Let T = or [|nj s.t. T — n is even, let 0 < 2t < T — n and define f2t = 
(-l)'Y2r- 

1-Lt = ^ X",=o ('0 cos(co„,f ) and Y 2 r+i = 0, 

2. |Y2r-Y2(r+l)|=0(^), 

3. 3c s.t. for tc = we have |a7’_2(cl < j- 

Proof of Claim 4: Omitted. A complete proof will be given in another version. 

We now can give a lower bound on |(3;| in terms of quantities of the unmeasured walk: 

Claim 5. Lettc be as in Claim 4.3. IfJ,^ " || 3 „+ 2 (i = 0 ( 15 ^) then |(3„+2;| > |oc„+ 2 ;|- 
\o.n+2t-2\-o{^)forT-n-2tc< 2t < (T-n). 

Proof of Claim 5: Omitted. Will appear in another version. 

If the assumption of Claim 5 is not true, then Q(j^) = X/3) IPn+2i| 

/ 2 

< V ^ IP«+2'i - PT = 

The rest of Th. 2 follows from Claim 2 and Claim 5, pr = lj=n IP' P ^ '^'t=T-[cfn\ IP'I^ 



Pr>-^i 1 IWf>T=( I 



t=T- 



t=T- [cfn\ 



|a,|-|a,_i|-o(— ))2 
\/n 



(lari - ar-icvnj-i 



^(l))^ ^ (|ar|-l/2-o(l))^ 

“ Ca/m 



From Theorem 1 we know laT-l = 1 — 0{ " ) which establishes pr > = 

Q(.^) if the assumption of Claim 5 is true or pr = Q( if it is not, in both cases 

proving the theorem. | 



4 Dependence on the Initial State 

One might wonder how much this polynomial hitting time depends on the fact that the 
walk is from one vertex to exactly the opposite corner of the hypercube. What if the two 
states where not exactly opposite? It is easy to see that if we start the walk in a neighbor 
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of |jr) we still obtain a polynomial hitting time to |x), since after one step the walk 
spreads evenly to all its neighbors and a polynomial {0{\/n)) fraction of the amplitude 
will be on |x). This in turn implies that after T = Jt/2n steps a polynomial fraction 
of the amplitude will be on |x). This type of argument shows that for polynomially 
sized neighborhoods of |x) we get polynomial hitting times. But how large can the 
“polynomially |x) hitting” region around |x) be? It turns out that a polynomial hitting 
time can not be true in general. We give a limit that comes from the lower bound on 
quantum unstructured search ([BB+97]). 

Theorem 3. Let Sx be a neighborhood of x ( defined e.g. by a cut-off Hamming distance 
from x) s.t. for y S Sx the quantum walk has a {0{poly(n)) ,Q.{\ / polyin)) concurrent 
(|y), |x)) hitting time. Then = 0{poly{n) ■ s/^). 

Proof of Theorem 3: We will think of Sx as a ball around x, but the neighborhood of 
a node can be defined in any arbitrary way, the arguments go through for all of them. 
So 5;^ = {y : <f//(x,y) < dc] where dn is the Hamming distance and dc is a cut-off such 
that all y G Sx have {0{p{n) ) , Q( 1 /q{n) ) concurrent ( |y) , |x) hitting time. By symmetry, 

=: 5 is independent of x. Let us cover the hypercube with K balls of size iSj, where 
each of the balls is centered around a node xi,X 2 , . . . ,xk- A simple probabilistic argu- 
ment shows that we can achieve this with K — 0{n ■ 2"/|S|) balls. Define a quantum 
search algorithm as follows: starting in |xi) launch an |x) -stopped quantum random 
walk as in Def. 3, where |x) is the marked state we are searching for. That means at 
every step we query the oracle with the current state of the walk and the question “Is 
this the marked state or not?”. (We can adapt the standard oracle in Grover’s algorithm 
[Gro96] to behave this way by measuring the auxiliary output qubit of the oracle.) We 
iterate this quantum walk for p(n) steps and use classical amplification (repeat q{n) 
many times). We repeat the amplified walk for each initial state |x,) : i = I ...K. With 
probability close to 1 one of the walks will find the marked state. The whole algo- 
rithm takes 0{p{n) ■ q{n) ■ K) queries. From the query lower bound of Q(v^) for any 
unstructured quantum search algorithm [BB+97] it follows that K — Q.{s/^ / polyfi)) 
which yields the upper bound on |5|. | 



5 Quantum Routing 

Let us apply rapid hitting of the quantum random walk to sequential routing of a packet 
in a noisy network with a possible adversary trying to prevent the arrival of the packet. 
Assume the time when the packet is launched from node x is given only approximatively 
to the other nodes. We focus on both robustness of the algorithm against random noise 
(edge deletion, faulty nodes) as well as malicious attacks (adversary choses the most 
vulnerable edges/nodes to delete). The nodes of the network are bit-strings of length n 
and each node is connected to all nodes that differ by exactly one bit, so that the network 
has the topology of the hypercube. Consider quantum routing from node x to node y as 
follows: 

(1) Let d = dH{x,y). We route on the sub-cube of dimension d spanned by the 
support of x©y (i.e. all strings z s.t. Zi = x, whenever x, = y,)- The coin-space of the 
quantum random walk is cf-dimensional; call the coin operator Cd- We assume that each 
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node V is capable to locally apply Q (g) |v)(v| (e.g. the bitpositions x 0 y of the sub-cube 
are broadcast). Nodes can locally implement the conditional shift (which requires only 
interactions between nearest neighbors). Both operations are local in the topology of 
the hypercube and can be implemented in a quantum network. 

(2) The quantum random walk is applied T = dj times (rounded appropriately). 

(3) After T steps the state of the system is measured. With probability 1 — ‘‘ ) 

the packet is at y. 

(3’) At each time step node y performs the partial measurement to see if it has 
received the packet or not. After T steps the probability that the packet is at y is 
Q(l/«log^«). In case of failure the packet can be resent (O(nlog^n) times) to boost 
the success probability close to 1 . 

Let us state the quantum advantages of this algorithm when x and y differ in Q(n) 
bits (almost surely for random x and y). 

Classically we could route the packet deterministically (by fixing the path in ad- 
vance). We need to broadcast either the path or x (or y) so the nodes know which bit to 
flip when they receive the packet; the non-exact start makes it otherwise impossible to 
deduce this. This strategy is fast (T = 0(d)) but neither secure against failure of one of 
the routing nodes/transversed edges nor against adversarial attacks. It suffices to affect 
one node/edge on the fixed path and the routing will fail. A fast randomized algorithm 
can flip the necessary bits in some random order. This strategy is robust against dele- 
tion of a subexponential number of random edges or nodes but requires again common 
knowledge of y. This in turn makes it vulnerable to adversarial attacks (it suffices to 
delete all the edges incident to y). A fully randomized classical routing algorithm, cor- 
responding to a simple random walk on the cube, is robust against adversarial attacks 
but takes exponential time. It is here that quantum routing has an advantage. The nodes 
do not have to know the origin x and destination y of the packet, only x0y. In the one- 
shot case (3’) even the node at y does not have to know that it is the target - only at the 
measurement stage will it receive the packet. (This might enforce a more cooperative 
behavior of each of the routing nodes since they all could be the target). Knowledge of 
x0y alone is not sufficent to identify the most vulnerable edges (those incident or close 
to X and y) which reduces the adversary to random noise. 

If a subexponential number of edges is deleted at random or a subexponential num- 
ber of random nodes does not cooperate in the process, the success probability of the 
quantum routing algorithm changes only by an exponentially small amount. 

To account for edge deletion we can assume that the deleted edge is replaced by 
a self-loop at each of its incident nodes. A faulty node v could apply any local oper- 
ation Ov ® |v)(v| (including measurements) instead of Cd 0 |v)(v|. Almost surely the 
deleted edges or faulty nodes will be in a region of the hypercube of Hamming weight 
I ± 0(s/d). In this region there is an exponential number of nodes for each Hamming 
weight. Since the walk spreads symmetrically over all states of same Hamming weight, 
the amplitude of each single state is exponentially small and perturbing a subexponen- 
tial number of them in each step can induce only an exponentially small perturbation to 
the state of the walk. The walk is only 0(d) steps long so these exponential perturba- 
tions cannot add up to anything significant. | 
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Note that the fact that all the adversary can do is essentially random allows us to 
use this type of argument. If even an exponentially small change at each step happens 
outside the region around Hamming weight d /2 the resulting perturbation can be large 
- this is precisely the difficulty in proving Theorem 2 from Theorem 1 . 

It is important to see the quantum routing algorithm not only in terms of its advan- 
tages over classical routing. It is conceivable that quantum nets will be available in the 
near future and new routing strategies might have to be applied e.g. to distribute qubits 
to establish secret keys between certain nodes in the network. Our algorithm is a first 
step in this spirit. 
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A Continuous- Time Quantum Random Walk 

The continuous-time walk has been defined in [FG98] as a quantum version of the 
classical continuous-time walk (see Sec. 2.1). To make the classical continuous walk 
with generator Q quantum one simply sets U (t) = exp(iQt), which is unitary as long as 
Q = (which is the case for simple random walks on undirected graphs). This walk 
works directly with the space formed by the nodes of the graph and does not require 
auxiliary coin spaces. To date it is not clear how the continuous and the discrete time 
walk are related. 

For the hypercube the continuous time quantum walk is described by the following 
transformation on the space spanned by n-bit strings [MR02]: 

• . . . • 

where Xi acts only on the /th bit as Y|0) = 1 1) and Y|l) = |0). The expression in the 
exponential corresponds to the adjacency matrix of the hypercube. \Jcom{t) can be sim- 
ulated uniformly by a quantum circuit with 0{n) local gates. It is now straightforward 
to prove the following theorem: 

Theorem 4 (One - shot hitting time). Vcont has a (T = 1) and a (T = ™ 1 — 

0(1 one shot hitting time for (3 = const < 1 /2. 
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Proof: Omitted. 

Definitions (|x)-stopped walk, concurrent hitting time:). The \x)-stopped walk is 
the iterative process where first a measurement with {Ho = |x)(x|,rii = 1 — Ho} is 
performed. If\x) is measured the walk is stopped, otherwise Ucom;(1) is applied and the 
procedure repeated. The walk has a (T,p) concurrent hitting time if the probability to 
stop before time T is > p. 

Theorem 5 (Concurrent hitting time). The continuous time walk has a 
{T = concurrent hitting time. 

Proof: Omitted. 
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Abstract. We initiate a study of property testing as applied to visual 
properties of images. Property testing is a rapidly developing area inves- 
tigating algorithms that, with a small number of local checks, distinguish 
objects satisfying a given property from objects which need to be mod- 
ified significantly to satisfy the property. We study visual properties of 
discretized images represented by n x n matrices of binary pixel values. 
We obtain algorithms with query complexity independent of n for several 
basic properties: being a half-plane, connectedness and convexity. 



1 Introduction 



We chose to investigate connectedness 
because of a belief that this predicate is 
nonlocal in some very deep sense; therefore 
it should present a serious challenge to any 
basically local, parallel type of computation. 

Perceptrons 
Marvin Minsky and Seymour Papert 

Images are typically so large that it is impractical to read every single bit of them. 
It is natural to ask what properties of an image can be detected by suhlinear 
algorithms that read only a small portion of the image. In general, most problems 
are not solvable exactly with that restriction. Property testing [16,11] (see [15,9] 
for surveys) is a notion of approximation tailored for decision problems and 
widely used for studying sublinear algorithms. Property tests distinguish inputs 
with a given property from those that are far from satisfying the property. Far 
means that many characters of the input must be changed before the property 
arises in it. The query complexity of a property test is the number of characters 
it reads. The goal is to design tests with sublinear complexity. 

Image analysis is one area potentially well suited to the property testing 
paradigm. Some salient features of an image may be tested by examining only a 
small part thereof. Indeed, one motivation for this study is the observation that 
the eye focuses on relatively few places within an image during its analysis. The 
analogy is not perfect due to the eye’s peripheral vision, but it suggests that 
property testing may give some insight into the visual system. 

S. Arora et al. (Eds.): APPROX 2003-1- RANDOM 2003, LNCS 2764, pp. 370-381, 2003. 
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In this paper, we present tests for a few properties of images. All our tests 
have complexity independent of the image size, and therefore work well even for 
huge images. We use image representation popular in learning theory (see, e.g., 
[14,13]). Each image is represented by an n x n matrix M of pixel values. We 
focus on black and white images given by binary matrices with black denoted 
by 1 and white denoted by 0. To keep the correspondence with the plane, we 
index the matrix by {0, 1, . . . , n — 1}^, with the lower left corner being (0, 0) and 
the upper left corner being (0, n — 1). The object is a subset of {0, 1, . . . , n — 1}^ 
corresponding to black pixels; namely, {{i, j)\Mi^j = 1}. 

1.1 Property Testing in the Pixel Model 

The distance between two images of the same size is defined as the number 
of pixels (matrix entries) on which they differ. (Two matrices of different size 
are considered to have infinite distance.) The relative distance is the ratio of 
the distance and the number of pixels in the image. A property is defined as a 
collection of pixel matrices. The distance of an image (matrix) M to & property 
V is miiiM'e-p dist{M, M'). Its relative distance to V is its distance to V divided 
by the size of the image matrix. An image is e-far from V if its relative distance 
to V is at least e. If the image is not er-far from V, it is e-close to it. 

A property is (e, q) -testable if there is a randomized algorithm that for every 
input matrix M queries at most q entries of M and with probability at least | 
distinguishes between matrices with the property and matrices which are e-far 
from having it. The algorithm is referred to as an {e,q)-test. This definition 
allows tests to have 2-sided error. An algorithm has 1-sided error if it always 
accepts an input that has the property. 



1.2 Our Results 

We present tests for three visual properties: being a half-plane, convexity and 
connectedness. The number of queries in all tests is independent of the size of 
the input. The algorithm for testing if the input is a half-plane is a 1-sided error 
test with _(- o(i) queries. The convexity test has 2-sided error and makes 
0(l/e^) queries. Finally, the connectedness test has 1-sided error and makes 
O (^log^ i) queries. 



1.3 Related Results in Property Testing 

Previous papers on property testing in computational geometry [7,6] consider a 
model different from ours, where the input is the set of object points and a query i 
produces coordinates of the ith point. Their results, in general, are incomparable 
to ours. In their model, the problems we consider would have query complexity 
dependent on the number of points in the object. But they are able to study 
properties which are trivially testable in our model because all instances are 
either close to having the property or close to not having it. An example is the 
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property that a given graph is a Euclidean minimum spanning tree of a given 
point set in the plane [7]. 

Another related work is [10] which studies properties of d-dimensional ma- 
trices. It gives a class of properties which are testable with a number of queries 
polynomial in 1/e. It does not seem applicable to our geometric properties. 

Goldreich and Ron [12] study property testing in bounded degree graphs 
represented by adjacency lists. Note that an image in the pixel model can be 
viewed as a graph of degree 4 where vertices correspond to black pixels and they 
are connected by an edge if the corresponding entries in the image matrix are 
adjacent. (See the definition of the image graph in the beginning of section 4.) 
Goldreich and Ron measure distance between graphs as the ratio of the number 
of edges that need to be changed to transform one graph into the other over 
the maximum possible number of edges in the graphs with the given number 
of vertices and degree. In our case, the distance between two image graphs cor- 
responds to the fraction of points (vertices) on which they differ, i.e. the edge 
structure of the graphs is fixed, and only vertices can be added or removed to 
transform one graph into another. Our connectedness test is exactly the same as 
the connectivity test in [12], with one minor variation due to different input rep- 
resentation and the fact that the pixel model allows graphs with a small number 
of vertices. (In the bounded degree graph model, the number of vertices is a part 
of the input.) However, since our distance measures are different, their proof of 
correctness of the algorithm does not apply to the pixel model. 

One more paper that studies fast algorithms for connectedness in graphs 
is [5]. It shows how to approximate the number of connected components in an 
arbitrary graph in a sublinear time. 



1.4 Related Results in Learning 

In property testing terminology, a PAG (probably approximately correct) learn- 
ing algorithm [17] is given oracle access (or access via random samples) to an 
unknown target object with the property V and has to output a hypothesis which 
is within relative distance e to the target with high probability. If the hypothesis 
is required to have the property V, the learning algorithm is proper. As proved 
in [11], a proper PAG learning algorithm for V with sampling complexity q{e) 
implies a (2-sided error) {e,q{e/2) + 0(l/e))-test for V. 

Learning half-planes exactly is considered in [14]. This work gives match- 
ing upper and lower bound of 0(log n) for the problem. In the PAG model, 
a proper learning algorithm with 0(l/elog(l/e)) sampling complexity follows 
from [3]. Together with the [11] result above, it implies a (2-sided error) (e, 
0(l/elog(l/e)))-test for the half-plane property. Our result for testing half- 
planes is a modest improvement of shaving off the log factor and making the 
error 1-sided. 

The generic approach of [11] for transforming PAG proper learners into prop- 
erty tests does not seem to work well for convexity and connectedness. The 
complexity of PAG learning algorithms is at least proportional to Vapnik Gher- 
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vonenkis (VC) dimension^ [8] . Since VC dimension of convexity is 0{n) and VC 
dimension of connectedness is 0(n^), the corresponding tests obtained by the 
generic approach have query complexity guarantee 0{n) and O(n^), respectively. 
Our tests for these properties have query complexity independent of n. 



2 Testing if an Image Is a Half-Plane 

First we present an algorithm for testing whether the image is a half-plane. An 
image is a half-plane if there is a vector w and a number a G K such that a 
pixel X is black if and only if w"^x > a. The algorithm first finds a small region 
within which the dividing line falls. Then it checks if pixels on one side of the 
region are white and on the other side are black. 

Call pixels (0, 0), (0, n— 1), (n — 1, 0), (n — 1, n — 1) corners. Call the first and 
the last row and the first and the last column of the matrix sides. For a pair 
of pixels pi,P 2 , let i{pi,p 2 ) denote the line^ through pi,p 2 . Let Ri{pi,p 2 ) and 
R 2 {pi,P 2 ) denote the regions into which i{pi,p 2 ) partitions the image pixels not 
on the line. 



Half-plane test Ti{e) 

Given access to an n x n pixel matrix, 

1. Query the four corners. Let s be the number of sides with differently 

colored corners. 

(a) If s = 0 (all corners are of the same color c), query — pixels 
independently at random. Accept if all of them have color c. 
Reject otherwise. 

(b) If s = 2, 

i. For both sides with differently colored corners, do binary 
search of pixels on the side to find two differently colored 
pixels within distance less than en/2. For one side, call the 
white pixel wi and the black pixel bi. Similarly, define W2 
and &2 for the second side. 

ii. Let Wi = Ri{wi,W 2 ) and Bi = i?i(&i,& 2 ) for i = 1,2. 
W.l.o.g., suppose W 2 and Bi intersect while Wi and B2 do 
not. Query pixels from Wi U B 2 independently at ran- 
dom. Accept if all pixels from Wi are white, all pixels from 
i ?2 are black. Otherwise, reject. 

(c) If s is not 0 or 2, reject. 



^ The VC dimension is the cardinality of the largest set X C {0, . . . , n — 1}^ shattered 
by V. A set X C {0, . . . , n — 1}^ is shattered by V if for every partition {Xq,X\) of 
X, V contains a matrix M with = 1 for all x £ Xi and = 0 for all x £ Xq. 

^ Whenever a geometric notion (e.g., line, angle, convex hull) is used without a defini- 
tion, it refers to the standard continuous notion. All discretized notions are defined. 
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Theorem 1. Algorithm T\ is a 1-sided error {e, 2-!^ + o{^))-test for the half- 
plane property. 

Proof. The algorithm queries at most + 0(log(l/£r)) pixels. To prove cor- 
rectness, we need to show that all half-planes are always accepted, and all images 
that are e-far from being half-planes are rejected with probability at least 2/3. 

Case (a) [0 differently colored sides]: The image is a half-plane if and only if it 
is unicolored. If it is unicolored, the test always accepts since it never finds pixels 
of different colors. If the image is er-far from being a half-plane, it has at least 
en^ pixels of a wrong color. Otherwise, it can be made unicolored, and hence a 
half-plane, by changing less than an e-fraction of pixels. The test fails to find an 
incorrectly colored pixel and accepts with probability at most (1 — < 1/3. 

Case (b) [2 differently colored sides]: The test always accepts all half-planes 
because it rejects only if it finds two white pixels and two black pixels such that 
the line through the white pixels intersects the line through the black pixels. 

It remains to show that if an image is e- 
far from being a half-plane, it is rejected with 
probability >2/3. We prove the contraposi- 
tive, namely, that if an image is rejected with 
probability <2/3, modifying an e fraction of 
pixels can change it into a half-plane. 

Suppose that an image is accepted with 
probability > 1/3 = > (i _ e/2)2in3/e^ 

That means that < e/2 fraction of pixels from 
which we sample in step l(b)ii differ from the 
color of their region (white for Wi and black 
for B 2 ). Note also that there are at most en/2 
pixels outside of Wi U B 2 . Changing the color 
of all black pixels in Wi and all white pixels in 
B 2 and making all pixels outside of those re- 
gions white, creates a half-plane by changing 
< e fraction of the pixels, as required. 

Case (c) Jeverything else]: The number of image sides with differently colored 
corners is even (0, 2, or 4). That holds because the cycle ((0, 0), (n — 1, 0),(n — 
l,n— l),(0,n— 1),(0,0)) visits a vertex of a different color every time it moves 
along such a side. So, the only remaining case is 4 differently colored sides. In 
this case, the image cannot be a half-plane. The test always rejects. □ 



Wi bi 




3 Convexity Testing 

The image is eonvex if the convex hull of black pixels contains only black pixels. 
The test for convexity first roughly determines the object by querying pixels on 
the n/u X n/u grid with a side of size u = 0{en). Then it checks if the object 
corresponds to the rough picture it obtained. 
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For all indices divisible by u, call the set {{i',j')\ i' G [i,i+u],j' G [j, j+w]} 
a u-square. We refer to pixels (i + u,j){i + u,j + u), and {i,j + u) as its 

corners. 

Convexity test T2(e) 

Given access to an n x n pixel matrix, 

1. Query all pixels with both coordinates divisible by tt = [en/120j . 

2. Let B be the convex hull of discovered black pixels. Query | pixels 
from B independently at random. Reject if a white pixel in B is 
found in steps 1 or 2. 

3. Let W be the union of all it-squares which contain no pixels from B. 
Query - pixels from W independently at random. Reject if a black 
pixel is found. Otherwise accept. 

Lemma 1, used in the analysis of the 
convexity test, asserts that the number 
of pixels outside i? U kF is small. 

Lemma 1. In an n x n image, let B 
he the convex hull of black pixels with 
coordinates divisible by u. Let W be the 
union of u-squares which contain no 
pixels from B. Let the ‘fence” F be the 
set of pixels not contained in B or W. 

Then F contains at most Aun pixels. 

Proof. Intuitively, F is the largest when 
it contains all w-squares along the sides 
of the image. We call u-squares that are 
not fully contained in i? or kF fence 
u-squares. Note that F is covered by 
fence u-squares. Therefore, to prove the 
lemma it is enough to show that there 
are at most 4n/u fence u-squares. 

To count the fence u-squares, we 
define a cyclic ordering on them. To 
do that, we describe a walk that con- 
nects centers of all/ence u-squares. The 
walk goes from one center to the next 
by traveling left, right, up or down. 

It visits the centers of fence u-squares 
by traveling clockwise and keeping the 
boundary between F and kF on the 
left-hand side. Each fence u-square is 
visited because it intersects with some 
it-square in W in at least one pixel. 



©■ ■©■ ■©■ ■©■ ■©■ ■©■ ■©■ ■©■ ■© 
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Fig. 2. Convexity test 
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There are n/u rows of w-squares. We claim that from each of these rows the 
walk can travel up at most once. Suppose for contradiction that it goes up twice, 
from (.1 to and from r\ to T2, where and ri are fence u-squares with centers 
in row (k + 0.5)u, and £2 and r2 are fence u-squares with centers in row (fc + 1.5)rt 
for some integer k. 

W.l.o.g. suppose that the centers of I1J2 are 
in a column with a lower index than the centers of 
ri, r2- Since the walk keeps the boundary between 
W and F on the left-hand side, the left corners 
of -^1, ^2, are in W. By definition of fence 

u-squares, ^1, ^2, ?'i, ?'2 each contain a pixel from 
B. The common left corner of ri and r2 is also in 
B, since B is convex. But this is a contradiction 
because W and B are disjoint. 

Thus, the walk can travel up only once per row. Similarly, it can travel down 
only once per row, and travel left (right) only once per column. Since there are 
n/u rows (columns) of u-squares, the walk can have at most 4n/u steps. As it 
visits all fence u-squares, there are at most 4n/u of them. Since each u-square 
contributes u^ pixels, the number of pixels in F is at most 4nu. □ 

The analysis of the convexity test uses the fact that if an image is convex, 
W contains only a small number of black pixels. Proposition 1 proves this fact 
for a special case of an image which is “invisible” on the big grid. Later, we use 
the proposition to handle the general case in lemma 2. 

Proposition 1. In an n x n convex image, if all pixels with both coordinates 
divisible by u are white, then the image contains less than 2un black pixels. 

Proof. Let black{r) denote the number of black pixels in a row r. If each row 
contains fewer than u — 1 pixels, the total number of black pixels is at most un. 
Otherwise, consider a row r with blacker) > u. Let integers k and t be such 
that r = ku-\- 1 and 0 < t < u. Since the image is convex, black pixels of every 
fixed row must have consecutive column indices. Since every pixel with both 
coordinates divisible by u is white, black{ku) < u and black{{k -\- l)u) < u. 

Because of the convexity of the object, if black(ri) < blacker) for a row ri > r 
then black{r2) < black(ri) for all rows C2 > r\. Similarly, if black(ri) < blacker) 
for a row ri < r then black{r2) < black(r{) for all rows ^2 < ri. Thus, all rows 
C2 excluding ku -\- l,ku -\- 2 , . . . , (k -\- l)u — 1 have black(r2) < u. Together, they 
contain < (n — u)u black pixels. Cumulatively, the remaining u — 1 rows contain 
< (u — l)n pixels. Therefore, the image contains less than 2un black pixels. □ 

Lemma 2. In an nxn convex image, let W be the union of all u-squares which 
contain no pixels from B. Then W contains less than Sun black pixels. 

Proof. As before, let F be the set of all pixels not contained in B or W. We call 
pixels on the boundary between F and W with both coordinates divisible by u 
fence posts. Since all fence posts are white, any portion of the object protruding 
into W has to squeeze between the fence posts. We show that there are at most 
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three large protruding pieces, each of which, by proposition 1, contains less than 
2un pixels. All other sticking out portions fall close to the fence and are covered 
by the area containing less than 2un pixels. 

Let O be the boundary of our convex object. O can 
be viewed as a piecewise linear trajectory on the plane 
that turns 360°. Whenever O leaves region F to go into 
W, it has to travel between two fence posts. Whenever 
O comes back into F, it has to return between the same 
fence posts because the object is convex and fence posts q. 
do not belong to it. The figure depicts an excursion of 
O into W with accumulated turn a. 

Notice that since O turns 360° total, at most 3 ex- 
cursion into W have accumulated turn > 90°. Each of them can be viewed as 
delineating a part of our convex object, cut off by the line between the fence 
posts. This part of the object is convex, and therefore, by proposition 1, has 

< 2un pixels. This gives us a total of < Qun pixels for the protruding parts 
where O turns more than 90°. 

Consider any excursion into W where O leaves F between fence posts p\ and 
P 2 and turns < 90° before coming back. Any such trajectory part lies inside the 
circle of diameter u containing p\ and p 2 . The half of the circle protruding into 
W is covered by a half of a rt-square. By an argument identical to counting fence 
squares in lemma 1, there are at most An/u segments of the F /W boundary 
between adjacent fence posts. Therefore, the total number of pixels that might 
be touched by the parts of the object, described by O’s excursions into W that 
turn < 90° is at most 4n/u ■ nf j2 = 2un. 

Thus, the total number of black pixels in W is at less than %un. □ 

Theorem 2. Algorithm T 2 is a -test for convexity. 

Proof. The test makes {n/u)"^ 0{l/e) = 0(l/£^) queries. We bound failure 

probability, considering convex and far from convex images separately. 

If the input image is convex, B contains only black pixels. The test never 
rejects in step 2. By lemma 2, the fraction of black pixels in W is < 8u/n = e/15. 
By the union bound, the probability that the test rejects in step 3 is < = |- 

If the input image is e-far from convex, it has > 2en^/5 white pixels in B 
or > 2ev? jh black pixels in W . Otherwise, we could make the image convex by 
making all pixels in W white and all remaining pixels black. It would require 

< 2er? jh changes in B, < 2er? jh changes in IT, and by lemma 1, < Aun < ev? jh 
changes in F . Thus, the distance of the image to convex would be less than enf . 

Suppose w.l.o.g. that there are > 2e/5 black pixels in IT. Step 3 will fail to 
find a black pixel with probability < (1 — < |. □ 




4 Connectedness Testing 

Define the image graphGM = {V,E) of image matrix M by V = {{i, j)\Mij = 1} 
and E = {((*i,/), (* 2 , j))| |*i-* 2 | = l}U{((i, ji), (bJ 2 ))| IJ 1 -/ 2 I = !}• In other 
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words, the image graph consists of black pixels connected by the grid lines. The 
image is connected if its image graph is connected. When we say that the image 
has k connected components, we are also referring to its image graph. 

The test for connectedness looks for isolated components of size less than 
d = 4/e^. We prove that a significant fraction of pixels are in such components if 
the image is far from connected. When a small isolated component is discovered, 
the test rejects if it finds a black pixel outside of the component. Lemma 3 
implies that if an image is far from connected, it has a large number of connected 
components. An averaging argument in lemma 4 demonstrates that many of 
them have to be small. This gives rise to a simple test T 3 , which is later improved 
to test T 4 with more careful accounting in proposition 2 . 

Both tests for connectedness and proposition 2 are adopted from [12]. The 
only change in the tests, besides parameters, is that after finding a small com- 
ponent, we make sure there is some point outside of it before concluding that 
the image is far from connected. 

Connectedness test T3{e) 

Let d = ^ — 0 ( 1 ) and d = 4/e^. Given access to an n x n pixel matrix, 

1. Query 2/S pixels independently at random. 

2. For every pixel (i,j) queried in step 1, perform a breadth first search 
(BFS) of the image graph starting from {i,j) until d black pixels are 
discovered or no more new black pixels can be reached; i.e., for each 
discovered black pixel query all its neighbors if they haven’t been 
queried yet. If no more new black pixels can be reached, a small 
connected component has been found. 

3. If a small connected component is discovered for some (i, j) in step 2, 
query 2 /e pixels outside of the square [i — d,i + d\ x [j — d,j + 
d] independently at random. If a black pixel is discovered, reject. 
Otherwise (if no small connected component is found or if no black 
pixel is discovered outside of the small component), accept. 



Lemma 3. If an n x n image contains at most p connected components, they 
can be linked into one connected component by changing at most n(Y^ 2 p-|- 0 ( 1 )) 
pixel values from white to black. 

Proof. Let s = ny/2/p. To turn the image into one connected component, we first 
add the comb-like set S = {(*, j)| j = n— 1 or i = n— 1 or s divides *}. Now every 
connected component is linked to S by adding at most s/2 pixels leading to the 
nearest “tooth of the comb” . That is, if a component contains a pixel {ks+i, j) for 
an integer k and 0 < i < s/ 2 , add pixels {ks+l,j), (ks + 2 ,j ), . . . , {ks + i — l,j). 
Otherwise (a component contains a pixel {ks+i, j) for integer k and s/2 < i < s), 
add pixels {ks + I + 1, j), {ks + I + 2,j), . . . , {ks -I- s — 1, j). The first stage 
adds [S'] = n(n/s-|- 0(1)) pixels and the second, less than s/2 per connected 
component, adding the total of n{n/s + 0(1)) +ps /2 = n^j 2 p+ 0(1) pixels. □ 
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Lemma 4. If an image is e-far from connected, at least an ^ — o(l) fraction 
of its pixels are in connected components of size less than d = 4/e^ + o(l). 

Proof. Consider an n x n e-far from connected image with p connected compo- 
nents. By lemma 3, changing < n{^/^+ 0(1)) pixels makes it connected. Then 
n{^j2p + 0(1)) > en^, and p > j2 — 0(n). Let b be the number of black 
pixels. The average component size is b/p < v? I{e^r\} j2 — 0{n)) = 2/e^ -|- o(l). 
Thus, the fraction of components of size up to d = ^ + o(l) is > 1/2. That 
is, there are > p/2 = e^n^/4 — 0(n) such components. Since each connected 
component contains a pixel, > e^/4 — o(l) fraction of pixels are in connected 
components of size d. □ 

Theorem 3. Algorithm is a 1-sided -test for connectedness. 

Proof. The algorithm accepts all connected images because it rejects only if an 
isolated component and some pixel outside of it are found. 

It remains to show that an e-far from connected image is rejected with prob- 
ability at least 2/3. By lemma 4, such an image has at least a S fraction of its 
pixels in connected components of size less than d. The probability that step 1 
fails to find a pixel from a small connected component is (1 — < e“^. In 

step 2, 3d — 1 queries are sufficient to discover that a component of size d — 1 is 
isolated because it has at most 2d neighboring white pixels. There are at least 
£71^ — 4d^ black pixels outside of the 2 d x 2 d square containing the small isolated 
component. Step 3 will fail to find a black pixel with probability (1 — e)'^^ < e~^. 
By the union bound, the failure probability is at most 2/e^ < 1/3. 

The number of queries is at most 2/d x (3d — 1) -I- 2 /e = 0{e~'^). □ 

The algorithm can be improved by employing the Goldreich-Ron trick [12] of 
considering small components of different sizes separately. The following propo- 
sition is adopted from [ 12 ]. 

Proposition 2. If an image has at least C connected components of size less 
than d, there is i < log d such that at least ^ points are in connected com- 
ponents of size between and 2 ^ — 1. 

Proof. For some i < log d, the image has at least C/ log d connected components 
of size between and 2^ — 1. Each of them contains at least 2^~^ points. □ 

(Improved) Connectedness test Ti{e) 

Let d = ^ — o(l) and d = 4/e^. Given access to an n x n pixel matrix, 

1 . For £ = 1 to log d 

(a) Query pixels independently at random. 

(b) For every pixel (i,j) queried in step la, perform a BFS of the im- 
age graph starting from {i,j) until 2 ^ black pixels are discovered 
or no more new black pixels can be reached (a small connected 
component has been found). 

2. If a small connected component is discovered for some {i, j) in step 1, 
proceed as in step 3 of algorithm T 3 . 
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Theorem 4. Algorithm T 4 is a 1-sided (er, O log^ -test for connectedness. 

Proof. The algorithm accepts all connected images because it rejects only if an 
isolated component and some pixel outside of it are found. 

If an n X n image is £-far from connected, by the proof of lemma 4, it has 
at least a 5n^ connected components of size less than d. Proposition 2 implies 
that for some £ < log d, at least an fraction of its points are in connected 

components of size between and 2^ — 1. For this £, the probability that 
step 1 fails to find a pixel from a component of size between 2 ^“^ and 2 ^ — 1 is 
at most e~^. The rest of the correctness analysis is the same as in theorem 3. 

The number of queries is at most log d ■ O + 2/e = 0(^ log^ e) ■ 



5 Conclusion and Open Problems 

Employing the Paradigm from the Half-Plane Test The strategy em- 
ployed in the half-plane test of section 2 is very simple. First we approximately 
learn the position of the dividing line. Then, using the fact that all half-planes 
consistent with our knowledge of the dividing line differ only on a fixed e/2 
fraction of the pixels, we randomly check if the matrix corresponds to these 
half-planes on the remaining pixels. 

This suggests a general paradigm for transforming PAC learning algorithms 
into property tests with 1-sided error. Namely, consider a property V where all 
objects with V which are e/2-close to a given object are the same on all but 
e/2 fraction of the points. In addition, assume there is a proper PAC learning 
algorithm with sampling complexity q{n,e). Then the following test for V has 
1 -sided error and query complexity q{n,e/2)-\-0{\/e)\ learn the property within 
relative error of e /2 and then randomly test the object on points where all objects 
e/2-close to the hypothesis coincide. The proof of this fact is very similar to the 
case 2 of the analysis of the half-plane test. 



Extensions and Lower Bounds We restricted our attention to images rep- 
resentable by binary matrices. However, in real life images have many colors (or 
intensity values). Property tests for images represented by integer- valued matri- 
ces would be a natural generalization. For example, one can generalize convexity 
in the following way. Call an image represented by an n x n matrix with values 
in R convex if the corresponding function {0, 1, . . . , n — 1}^ ^ R is convex. 

A straightforward extension of our tests to d dimensions seems to give tests 
with dependence on d, and thus dependent on the size of the image. It would be 
interesting to investigate if this dependence is necessary. 

It is known that testing some properties requires a number of queries linear in 
the size of the input [4,2]. However, known hard properties do not seem to have a 
natural geometric interpretation. It would be nice to find natural 2-dimensional 
visual properties which are hard to test. One such result follows directly from 
[ 1 ] , which shows that testing whether a string of length n is a shift of another 
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string requires queries. This implies that testing whether the lower half 

of an n X n image is a shift of the upper half requires queries. It would 

be interesting to find even harder visual properties. 
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Abstract. We show that a random instance of a weighted maximum 
constraint satisfaction problem (or MAX 2-CSP), whose clauses are over 
pairs of binary variables, is solvable by a deterministic algorithm in poly- 
nomial expected time, in the “sparse” regime where the expected number 
of clauses is half the number of variables. In particular, a maximum cut in 
a random graph with edge density 1 /n or less can be found in polynomial 
expected time. 

Our method is to show, first, that if a MAX 2-CSP has a connected un- 
derlying graph with n vertices and m edges, the solution time can be 
deterministically bounded by Then, analyzing the tails of the 

distribution of this quantity for a component of a random graph yields 
our result. An alternative deterministic bound on the solution time, as 
2"*/®, improves upon a series of recent results. 



1 Introduction 

In this paper we prove that a maximum cut of a sparse random graph can be 
found in polynomial expected time. 

Theorem 1. For any c < \, a maximum cut of a random graph G(n, c/n) can 
be found in time whose expectation is poly(n), and using space 0(m + n), where 
m is the size of the graph. 

Our approach is to give a deterministic algorithm and bound its running 
time on any graph in terms of size and cyclomatic number. We then bound the 
expected running time for random instances by bounding the distribution of 
cyclomatic number in components of a sparse random graph. 

Theorem 2. Let G be a connected graph with m edges and n vertices. There 
is an algorithm that finds a maximum cut of G in time 0{m + n)min{2'"/®, 
2 (m-n)/ 2 |^ and in space 0{m + n). 

S. Arora et al. (Eds.): APPROX 2003-1- RANDOM 2003, LNCS 2764, pp. 382-395, 2003. 
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We remark that the bound in Theorem 2 is of independent interest, and 
improves on previous algorithms giving bounds of 2™/^ poly(m + n) [KF02] and 
2™/3 poly (m + n) [GHNR], 

In fact, the algorithm employs several local reductions that take us outside 
the class of max cut problems. We therefore work with the larger class max 
2-CSP of weighted maximum constraint satisfaction problems consisting of con- 
straints on pairs (and singletons) of variables, where each variable may take two 
values. Theorems 1 and 2 are then special cases of the more general Theorems 
3 and 5 below. 

1.1 Context 

Our results are particularly interesting in the context of phase transitions for 
various maximum constraint-satisfaction problems. Since the technicalities are 
not relevant to our result, but only help to put it into context, we will be infor- 
mal. It is well known that a random 2-SAT formula with density c < 1 (where the 
number of clauses is c times the number of variables) is satisfiable with proba- 
bility tending to 1, as the number n of variables tends to infinity, while for c > 1, 
the probability of satisfiability tends to 0 as n — > oo [CR92, Goe96, FdlV92]; for 
more detailed results, see [BBG+01]. More recently, max 2-SAT has been shown 
to exhibit similar behavior, so for c < 1, only an expected 6>(l/n) clauses go 
unsatisfied, while for c > 1, 0(n) clauses are unsatisfied [GGHS03, GGHS]. 

For a random graph G{n,cjn), with c < 1 the graph almost surely consists 
solely of small trees and unicyclic components, while for c > 1, it almost surely 
contains a “giant”, complex component, of order 0{n) [BolOl]. Again, [GGHS] 
proves the related facts that in a maximum cut of such a graph, for c < 1 only 
an expected 0(1) edges fail to be cut, while for c > 1 it is 0(n). 

Theorem 3 is concerned with algorithms that run in polynomial expected 
time. Results on coloring random graphs in polynomial expected time can be 
found in [KV02, GOMS, TGO03]. For both MAX CUT and MAX 2-SAT, it seems 
likely that the mostly-satisfiable (or mostly-cuttable) sparse instances are algo- 
rithmically easy, while the not-so-satisfiable dense instances are algorithmically 
hard. While, as far as we are aware, little is known about the hardness of dense 
instances, our results here confirm that not only are typical sparse MAX CUT 
instances easy, but even the atypical ones can be accommodated in polynomial 
expected time; see the Gonclusions for further discussion. 

1.2 Outline of Proof 

Our proof of Theorem 3 has a few main parts. Since the maximum cut of a 
graph is the combination of maximum cuts of each of its connected components, 
it suffices to bound the expected time to partition the component containing a 
fixed vertex. 

In Theorem 5 we show that Algorithm A’s running time on a component 
is bounded by a function of the component’s cyclomatic number, the number 
of edges less the number of vertices plus one. For brevity we will call this the 
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“excess” (a slight abuse of the standard meaning, which is just edges minus 
vertices). Theorem 5 also gives a 2™/® poly(m + n) bound on the running time. 

In the randomized setting, Lemma 8 provides a bound on the exponential 
moments of the excess of a component. It does so by “exploring” the component 
as a branching process, dominating it with a similar process, and analyzing the 
latter as a random walk. This gives stochastic bounds on the component order 
u and, conditioned upon u, the “width” w (to be defined later); the excess is 
easily stochastically bounded in terms of u and w. 

Finally, we combine the running times, which are exponentially large in the 
excess, with the exponentially small large-deviation bounds on the excess, to 
show that Algorithm A runs in polynomial expected time. 

2 Solving a Maximum Constraint-Satisfaction Instance 

We begin by defining a class of weighted maximum constraint satisfaction prob- 
lems, or MAX CSPs, generalizing MAX CUT, and (in Theorem 5) bounding their 
running time in terms of parameters of an instance. 

2.1 Weighted Maximum Constraint-Satisfaction Problems 

We may think of MAX CUT as a MAX CSP in which the constraints simply prefer 
opposite “colors” on the endpoints of each edge, and all constraints have the 
same “weight” . We generalize this not only for the sake of a more general result 
but because we need to: intermediate steps of Algorithm A, applied to a MAX 
CUT instance, generate instances of more general type. 

For our purposes, a general instance of a (weighted) MAX 2-CSP consists of 
a graph G = (V,E), and a score function consisting of: a sum of “monadic 
constraint” scores of each vertex and its color, “dyadic” scores of each edge and 
the pair of colors at its endpoints, and (for notational convenience) a single 
“niladic” score (a constant). Specifically, there is a (niladic) score sq! for each 
X € V (monad) there is a pair of scores sfj, s% corresponding to the two ways 
that the vertex could be colored; and for each edge e = {x,y} S E (dyad) there 
is a 4-tuple of scores corresponding to the four ways that the 

edge could be colored, and the score of a coloring (j) : V ^ {R, B} is 

5'(<()) := So + XI + X 

xGV {x,y}GE 

(Note that for any C,D G {R,B}, s^b> refer to the same score, and 

thus must be equal.) Let S refer to the full collection of scores Sq and s^^ as 
above. Then MAX(y, E, S) is the computational problem of finding a coloring (j) 
achieving max^ S{(f>). 

As one quick example, MAX 2-SAT is such a MAX CSP. Using colors T (true) 
and F (false), a SAT constraint A V U is modelled as a dyadic constraint map- 
ping (T, F) to score 0 (unsatisfied) and any other coloring to score 1 (satisfied) . 
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Another example is MAX DiCUT, the problem of partitioning a directed graph to 
maximize the number of edges passing from the left side to the right. 

Our main result is that a weighted MAX 2-CSP on a random graph G{n, c/n), 
c < 1, can be solved in polynomial expected time, per the following theorem. 

Theorem 3. For any c < 1 and any n, let G{n,cjn) be a random graph, and 
let (G, S) be any weighted MAX ^-CSP instance over this graph. Then (G, S) can 
be solved exactly in expected time poly(n), and in space 0{m + n). 

2.2 Algorithm A 

In this section we give an algorithm for solving instances of weighted MAX 2- 
CSP. The algorithm will use 3 types of reductions. We begin by defining these 
reductions. We then show how the algorithm fixes a sequence in which to apply 
the reductions by looking at the underlying graph of the CSP. This sequence 
defines a tree of CSPs, which can be solved bottom-up to solve the original CSP. 
Finally, we bound the algorithm’s time and space requirements. 

Reductions The first two reductions each produce equivalent problems with 
fewer vertices, while the third produces a pair of problems, both with fewer 
vertices, one of which is equivalent to the original problem. 

Reduction I Let ?/ be a vertex of degree 1, with neighbor x. Reducing {V, E, S) 
on y results in a new problem {V , E' , S') with V' = V\y and E' = E\ xy. 
S' is the restriction of S to V' and E' , except that for G , D {R, 5} we set 

s'c = sc + -k s^}, 



i.e., we set 



s'r = sfj + max{s^^^ -h s% s^l^g + ^1} 
= Sb+ max{ss^B + 4 a + 4i- 



Note that any coloring </>' of V can be extended to a coloring of V in two 
ways, namely (/)jt and (j)B (corresponding to the two colorings of x); and 
the defining property of the reduction is that S'{4>') = max{5'((^i{), ^((^b)}. 
In particular, max^/ = max^S^cj}), and an optimal coloring (jj' for 
the problem MAx{V' , E' , S') can be extended to an optimal coloring (f> for 
MAx(y, E, S), in constant time. 





X 
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Reduction II Let y he a vertex of degree 2, with neighbors x and z. Reducing 
(V,E,S) on y results in a new problem (V',E',S') with V' = V \ y and 
E' = {E \ {xy,yz})VJ {xz}. S' is the restriction of S to V and E' , except 
that for C,D,E G {R, B} we set 

^'cD = + s|;} 

hi 



i.e., we set 



RR 



^RR ■ 



xy 

^RR 



^ RB — °RB 



BR 



^BR 



/ X Z n* y 

S BB = ^BB 



max{ ; 

- max{s^^ 

- max{ , 

- max{s“^ 



RR ' 
’BR 



BR 



^RR 

- 

°RB 

^RR 



^RB 



■”R^ ^RB 
y xy 
^RB 






^BR 

^BB 

^BR 



c.y 



cy^y 



^R^ ^BB 



^BB 



+ 4 } 
+ 4 } 
+ 4 } 
+ 41, 



where our notation presumes that if xz was not an edge in E, then = 0 
for all colors C and D. As in Reduction I, any coloring 4>' of V can be 
extended to V in two ways, (fin and 4>b, and S' picks out the larger of the two 
scores. Also as in Reduction I, max^/ S'{4>') = max 0 5'(^), and an optimal 
coloring </)' for MAx(R', E' , S') can be extended to an optimal coloring </> for 
MAx(y, A, S'), in constant time. (Note that neither multiple edges nor loops 
are created by this reduction, nor the next one.) 




Reduction III Let yhe a vertex of degree 3 or higher. Where reductions I and 
II each had a single reduction of {V, E, S) to {V , E' , S'), here we define a pair 
of reductions of (V, E, R), to {V' , E' , S^) and (W, E' , S^), corresponding to 
assigning the color R or B to y. We define V = V\y, and E' as the restriction 
of E to V\y. For C,D,E G {R, B}, S'" is the restriction of S to V\y, except 
that we set 

(s'")o = So + Sq, 
and, for every neighbor x of y, 

I® J_D — ■’D + ^DE- 

In other words, S^ is the restriction of S to R \ y, except that we set (sf ) = 
So + Sq and, for every neighbor x of y, 

(„R\" — ^x , xy y 

(,S jfl; — s^ i- S^^ + S^ 

(„R\^ — „x , xy , y 

I* Is — + *_R- 
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Similarly is given by (s^)o = so + every neighbor x of y, 



As in the previous reductions, any coloring (j)' of V \ y can be extended to 
V in two ways, 4>r and 4>b, corresponding to the color given to y, and now 
(this is different!) Sr{ 4>') = S{(j>R) and Sb{4^') = S{(()b)- Furthermore, 



and an optimal coloring on the left can be extended to an optimal coloring 
on the right in time 0{deg{y)). 



Defining Algorithm A in terms of these reductions is straightforward, and 
it should come as no surprise that the running time is polynomial in n and m, 
times 2 raised to the power of the number of times reduction III is employed. 
We now detail this. 

Setup Phase: Choosing a Sequence of Reductions First, observe that the 
two problems generated by reduction III have different score sets, but the same 
underlying graph. Thus each of the three reductions, considering only the graphs 
and ignoring the scores, reduces a graph to a subgraph of smaller order. 

Given an input graph G of order n. Algorithm A begins by constructing a 
sequence Gi, G 2 , . . . , G^, of at most n graphs, where Gi = G is the input graph, 
each subsequent graph is a reduction of its predecessor graph (ignoring scores), 
and the final graph Gi has no edges. 

Specifically, with an ordering on the vertices of G: if G has minimum degree 1, 
apply reduction I to the first vertex of degree 1; if G has minimum degree 2, apply 
reduction II to the first vertex of degree 2; and otherwise, apply reduction III to 
the first vertex of maximum degree. 

The precise running time of this setup procedure clearly depends on the data 
structures employed, but it is clearly polynomial. Maintaining a list of vertices 
of each degree, and the neighbors of each vertex, and storing only the changes at 
each step rather than the new graph, the time can be limited to 0(n + m) in the 
RAM model (where the length of an integer’s binary representation is ignored). 




max{max5'i{((()'), max5'B(())')} = maxS'((()) 
4 >' 4 >' 4 > 
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Solving the Tree of csps The sequence of graphs, along with another sequence 
specifying one binary value for each type-III reduction, determines a sequence 
of CSPs; the collection of all 2'’ binary sequences (where r is the number of type- 
III reductions) naturally defines a tree of CSPs, having depth i (we generate a 
child even for type-I and -II reductions) and 2’’ leaves (each type-III reduction 
producing 2 children for each CSP in the current generation). Given an optimal 
solution to a CSP’s child/children, an optimal solution to the CSP can be found 
by trying both extensions to the vertex “y”, in time 0(deg(y)). 

Starting from the leaf problems, and propagating their solutions upwards, 
solves the original problem. 



Analysis The foregoing procedure runs in time 0(m-|-n)2’'. Moreover, the tree 
can be stored and traversed implicitly, as a path with nodes corresponding to the 
graph reductions, and at each type-III node a state corresponding to which of the 
two reductions is currently being explored, yielding a space bound of 0(m-|- n). 
Thus we have the following lemma. 

Lemma 4. Given a weighted MAX ^-CSP whose underlying graph G is con- 
nected, and an order on the vertices of G, Algorithm A returns an optimal 
solution in time 0{m -\- n)2’’ and space 0{m -\- n), where r{G) is the (order- 
dependent) number of type-III reductions taken for G. 

3 Parametric Complexity 

The following theorem bounds the running time of Algorithm A in terms of 
parameters of the graph underlying the CSP. 

Theorem 5. Given a weighted MAX ^-CSP whose underlying graph G is con- 
nected, has order n, size m, and excess k = m — n. Algorithm A returns an 
optimal solution in time 0{m-\- . 

We remark that to prove our expected-time result (Theorem 3), we use only 
the 2”/^ bound. However, the 2’"/®0(m -|- n) bound, for arbitrary MAX 2-CSPs, 
is of independent interest. For MAX CUT it improves on the poly(m -|- n) of 
[KF02], and for MAX 2-SAT it matches the 2™/^ poly(m -|- n) bound of [GHNR] 
(which also gave a 2™/^ poly (m -I- n) bound for MAX cut). These works also used 
algorithms based on reductions. 

In light of Lemma 4, it suffices to prove that (for any order on the vertices 
of G), the number of type-III reduction steps r{G) is bounded by both m/5 and 
k/2. These two claims are proved in the following two subsections. 



3.1 Bounding in Terms of Excess 

Claim 6. For a connected graph G with excess k, the number of type-III reduc- 
tion steps of Algorithm A is r < max{0,K/2}. 
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Proof. The proof is by induction on the order of G. If G has excess 0 (it is 
unicyclic) or excess —1 (it is a tree), then type-I and -II reductions destroy all 
its edges, so r = 0. 

Otherwise, the first type-III reduction reduces the number of edges by at 
least 3 and the number of vertices by exactly 1, thus reducing the excess to 
k' < K — 2. If G' has components G'^, . . . , Gj, then r(G) = 1-1- Given 

that we applied a type-III reduction, G had minimum degree > 3, so G' has 
minimum degree > 2. Thus each component G( has minimum degree > 2, and 
so excess k[ > 0. Then, by induction, r(G) = 1-1- Yhi < 1 + Yl,i ^i/2 < 

1 -I- k'/2 < k/ 2. Note that the inductive step r(G') < k'/ 2 used the fact that 



3.2 Bounding in Terms of Size 

Claim 7. For a graph G with m edges, the number of type-III reduction steps 
of Algorithm A is at most m/5. 

Proof. Since type-I and type-II steps cannot increase the number of edges, it 
is enough to show that each type-III step, on average, reduces the number of 
edges by 5 or more. As long as the maximum degree is d > 5 this is clear, since 
each type-III reduction immediately destroys d edges. Thus it suffices to consider 
graphs of maximum degree d < 4; since the reductions never increase the degree 
of any vertex, the maximum degree will then remain at most 4. 

Given a graph of maximum degree at most 4, suppose that Algorithm A 
performs r type-III reduction steps, consisting of r^ reductions on vertices of 
degree 3, and reductions on vertices of degree 4 having k neighbors of degree 3 
and r—k neighbors of degree 4. (If a neighbor had degree more than 4 we should 
have chosen it in preference to y; degree 2 or less and we should have applied a 
type-I or -II reduction instead.) 

How many edges are destroyed by the r = r^-I ^4 type-IH reductions? 

Each “ra-reduction” deletes the 3 edges incident on y, each of which went to a 
vertex also of degree 3 (4 or more and we would have chosen it in preference 
to y, 2 or less and we would have applied a type-I or -II reduction), changing 
their degrees to 2 and subjecting each to a type-II reduction, and so destroying 

3 more edges. (A type-II reduction destroys edges yx and yz, and if edge xz was 
not previously present it creates it, thus reducing the number of edges by at least 
1, and possibly 2.) Similarly, each “rf reduction”, on a degree-4 vertex adjacent 
to k degree-3 vertices, along with the k type-II reductions it sets up, destroys 

4 -I- fc edges. Thus the average number of edges destroyed per step is at least 



□ 



6?^3 + ELo(4 + fc)^4 
rs + E' 




( 1 ) 



Glearly this ratio is at least 5 unless the value of can be made large, but we now 
show that the rj values must satisfy an additional condition which effectively 
prohibits this. 
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Note that each ra-reduction decreases the number of degree-3 vertices by 4 
(itself and its 3 neighbors), while each r|-reduction decreases it by 2fc — 4 (de- 
stroying k degree-3 neighbors, but also turning 4 — fc old degree-4 neighbors into 
new degree-3 vertices). Type-I and -II reductions do not affect the number of 
degree-3 vertices. Since the number of degree-3 vertices is initially non-negative, 
and finally 0, the decrease must be non-negative, i.e., 

^r^(2fc-4)+4r3>0. (2) 

k 

Subject to the constraint given by (2), how small can the ratio (1) be? To 
be (slightly) pessimistic, we may let the values and range over the non- 
negative reals. Multiplying the set of values by any constant affects neither the 
constraint nor the ratio, so without loss of generality we may set the denominator 
of (1) to 1. That is, we add a constraint 

r 3 + ^r'l = l, (3) 

and minimize 

4 

6 r 3 + '^{A + k)r^. (4) 

k=0 

This is simply a linear program (LP) with objective function (4) and the two 
constraints (2) and (3). The LP’s optimal objective value is 5, and the LP dual 
solution of (|,5) establishes 5 as a lower bound. That is, adding j times con- 
straint (2) to 5 times constraint (3) gives 

i (^{2k - 4)7-4 + + 5 (c3 -k ^ = 6c3 -k ^(4 -k k/2)r\ > 5, 

so (4), which is 67-3 -k X)(4 + k)r\, must be at least this large. 

This establishes that the number of edges destroyed by type-III reductions 
is at least 5 times the number of such reductions, concluding the proof. □ 

We note that the upper bound of m/5 is achievable; that is, m/5 type-III 
reductions are needed by some graphs. An easy example is ATs, with 10 edges, 
reduced by two type-III reductions to and the latter reduced to the 
empty graph by type-I and -II reductions. 



4 Stochastic Size and Excess of a Random Graph 

We stochastically bound the excess k of a component of a random graph G 
through a standard “exposure” process. Given a graph G and a vertex x\ in 
G, together with a linear order on the vertices of G, the exposure process finds 
a spanning tree of the component Gi of G that contains x\ and, in addition, 
counts the number of non-tree edges of Gi (i.e., calculates the excess). 
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At each step of the process, vertices are classified as “living”, “dead”, and 
“unexplored”, beginning with just x\ living, and all other vertices unexplored. 
At the ith step, the process takes the earliest living vertex Xi. All edges from Xi 
to unexplored vertices are added to the spanning tree, and the number of non- 
tree edges is increased by 1 for each edge from Xi to a living vertex. Unexplored 
vertices adjacent to Xi are then reclassified as living, and Xi is made dead. The 
process terminates when there are no live vertices. 

Now suppose G is a random graph in Q{n,cjn), with the vertices ordered 
at random. Let w{i) be the number of live vertices at the ith step and define 
the width w = maxrc(i). Let u = |Gi|, so that w(0) = 1 and w{u) = 0. The 
number of non-tree edges uncovered in the ith step is binomially distributed as 
B{w{i) — l,c/n), and so, conditioning on u and rc(l), . . . ,w{u), the number of 
excess edges is distributed as — c/n). Since — < uw, 

the (conditioned, and therefore also the unconditioned) number of excess edges 
is dominated by the random variable B(uw, c/n). 

At the ith stage of the process, there are at most n — i unexplored vertices, 
and so the number of new live vertices is dominated by i?(n — i, 1/n). Consider 
now a variant of the exposure process in which at each step we add enough 
special “red” vertices to bring the number of unexplored vertices to n — i. Let 
h{i) be the number of living vertices at the ith stage. Then h{0) = 1, and h{i) 
is distributed as h{i — 1) + B{n — i, c/n) — 1. Let X = nA min{i : h{t) = 0} and 
H = maxi<x h{i). 

By considering the second process as an extension of the first (and exploring 
the added vertices in the second process only when no other vertices remain), 
we obtain a coupling between the two processes such that u < X and w < H. 
Thus the excess of Gi is dominated by B{XH,l/n). 

Since the running time of Algorithm A is at most E(0(m -|- it can 

be bounded by the quantity 0(n^)E(-\/2^^^'^^’^^"^). It is useful to note that 



on bounding quantities of form Pr(A = x,H = h) exp((-\/2 — l)xh/n). 

Lemma 8. With h{t) the random process defined above, for all times z = 1, 2, . . . 
parametrized as an = i, 




z^p\l-p)^-^ = {pz+{l-p))^ = {l+p{z-l))^ < exp(p(z-l)n). 




Pr{h{an) > 0) < exp (— 3a^n/(24 — 8a)) . 



( 5 ) 



Furthermore, for any height h parametrized as h = (3n, with o? j (8— 4a) < (3 < a. 



Pr(max/i(t) > j3n \ h{an) = 0) < 0(n^^^)exp 
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In order to prove the lemma, we shall make use of the following fairly standard 
bound. 

Claim 9. With N = ni — let Z\, Z 2 , ■ ■ ■ , , be a random sequence of 

binomial random variables conditioned upon ~ 1- Parametrize i = 

an. Suppose that (3 is in the range a^/(8 — 4a) < (3 < a, and t < i. Then, 
writing N' = nt— (* 2 ^) , 



Pr(^ > /3n + (t - 1)) < O(v^) exp 1^- 1^/3 - ^ j . (7) 

We omit the proof. 

Proof (of Lemma 8). We first prove (5). Note that 

h(i) = B {{n — 1) + • • • + (n — i), 1/n) — i + 1 = B (^i ~ ^ ^ f + 1 

and so h{i) > 0 means that 

B ^ni ~ ^ ^ ^ f + 1 = an + 1. (8) 

This binomial r.v. has expectation 

(^an^ - (a- a^2)n. (9) 

Thus if (8) holds, the r.v. differs from its expectation by at least o?nj2. 

We use the inequality that for a sum of independent 0-1 Bernoulli random 
variables with parameters pi, . . . and expectation p, = iL P + 

t) < exp {—t^/{2p+ 2f/3)). Together with (9) this implies that (8) has probabil- 
ity at most exp (— (a"‘n^/4)/(2an(l — a/2) + a^n/3)) = exp (— 3a^n/(24 — 8a)). 
To prove (6), we bound the conditional probability 

Pr(max3i(t) > [3n \ h{an) = 0). (10) 

tKan 

In this part, rather than thinking of h(i) as B{ni — (*"^^),l/n) — f -I- 1, we 
think of it as a sum oi N = ni — independent Bernoulli random variables 
Zi each with distribution B{l/n), plus —i + 1. Note that, conditional on the 
sum of the Z^s, any particular assignment of Os and Is is equally likely: the 
collection of Z^s is a random binomial sequence conditioned upon h{an) = 0, 
i.e., upon having sum an — 1. We apply Claim 9 to show that for any given t, 
the probability of each of the events comprising that in (10) is bounded by 

(7), namely Pr{h{t) > (3n \ h{an) = 0) < O(V^) exp (^/3 - 
Summing over 1 < t = yn < an, the required bound (6) follows. □ 
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Recall the random process h defined before Lemma 8, with stopping time X 
and maximum height H . 



Lemma 10. 



E 



exp ( {V 2 — l)XH/n 



< n 



9/2 



Proof. We show that each possible pair X G {1, . . . , n— 1} and H G {1, . . . , 

0(1)} contributes at most 0(n^/^) to the expectation. Specifically, we show that 
for all a and (3, exp ((-\/2 — l)a/3n) Pr(X = an) Pr(F = j3n) = 



Case 1. If /3 < a^/(8 — 4a) then, from Lemma 8, 

Pr(X = an) < Pr{h{an) = 0) < Pr{h{an) > 0) < exp (— 3a^n/(24 — 8a)) 

( 11 ) 

and so 

exp (( V2 - l)a(3n) Pr(X = an) < exp (^( ^2 - 1) ■ 

This is less than 1 provided that 

V 2 -I ^ 3 

8 — 4a ~ 24 — 8a ’ 

which is easily verified to hold for all a G [0, 1]. 



Case 2. If /3 > a^/(8 — 4a) then, from Lemma 8, in addition to (11), we have 
that 



Pr(iL = /3n | X = an) < Pr(iJ > (3n \ X = an) 



< 0(n^^^) exp I — ( /3 — 



/4 y 7n 



So in this case it suffices to show that 



exp 
i.e., that 



{V 2 — l)aPn — [3a^n/(24 — 8a)] — 



(V2-l)a/3] - [3aV(24-8a)] - 



is at most 0. 

For fixed a G (0, 1], (13) is maximized by 



/ 3 - 



2 — a / 8a 



/4 y 7n 



2 — a ) 8a 



! 3 - 



/4y 7 



2 — a j 8a 



< 1 , ( 12 ) 



(13) 
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Substituting this value of /3 into (13), and multiplying by the (positive) quantity 
{a — 2) (a — 3)/a^ gives a quadratic which is easily seen to be negative on (0, 1]. 

Thus, in both Case 1 and Case 2, for any a and /3, the contribution of the 
X = an, H = (in term to the expectation of (v^ — 1)^^/" is at most 
and the sum of all O(n^) such contributions (recalling that X and H may take 
on 0{n) and O(n^) possible values, respectively) is 0(n®/^). □ 

We can now prove Theorem 3. 

Proof (of Theorem 3). By Theorem 5, and the remarks before Lemma 8, Al- 
gorithm A runs in expected time E(0(m -|- n)-\/2^ < 0(n^)E(-\/2^*'^^^) < 
0(n^)E(exp((-\/2— l)AiJ/n)). But it follows from Lemma 10 that this is 0(n^^/^). 

□ 



5 Conclusions 

In the present paper we focus on MAX CUT. Our result for “sparse” instances is 
strong in that it applies right up to c = 1, and we expect it could be extended 
through the scaling window, to c = 1-1- (at the expense of a constant factor 
depending on A in the run time, and additional complication in the analysis) . We 
also believe that our methods can be extended to MAX 2-SAT, but the analysis 
is certainly more complicated. In fact our results already apply to any MAX CSP, 
and in particular to MAX 2- SAT, but only in the regime where there are about 
n/2 clauses on n variables; since it is likely that random instances with up to 
about n clauses can be solved efficiently on average (the 2-SAT phase transition 
occurs around n clauses), our present result for MAX 2-SAT is relatively weak. 

Since MAX CUT is in general NP-hard (and even NP-hard to approximate 
to better than a 16/17 factor [TSSWOO]), it would be interesting to resolve 
whether dense instances of MAX CUT as well as sparse ones can be solved in 
polynomial expected time (thus separating the average-case hardness from the 
worst-case hardness) or whether random dense instances are hard. Precisely the 
same questions can be asked about MAX 2-SAT, and in both cases we would guess 
that dense instances are hard, even on average. 
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Abstract. Informally, a family JT C of permutations is fc-restricted 
min-wise independent if for any X C [0, n— 1] with |X| < k, each ® G X is 
mapped to the minimum among tt(X) equally likely, and a family Sn 
of permutations is fc-rankwise independent if for any X C [0, n — 1] with 
|X| < k, all elements in X are mapped in any possible order equally likely. 
It has been shown that if a family .7^ C of permutations is fc-restricted 
min-wise (resp. fc-rankwise) independent, then \T\ = ) (resp. 

\T\ = In this paper, we construct families .7^ C of permu- 

tations of which size are close to those lower bounds for fc = 3,4, i.e., 
we construct a family C of 3-restricted (resp. 4-restricted) min- 
wise independent permutations such that \T\ — O(nlg^n) (resp. \ J-\ = 
0{n Ig^ n)) by applying the affine plane AG(2, g), and a family .7^ C of 
4-rankwise independent permutations such that \T\ — 0{n^ Ig® n) by ap- 
plying the projective plane PG(2, q). Note that if a family JF C of per- 
mutations is 4-rankwise independent, then |F| = 17(n^). Since a family 
F C Sn of 4-rankwise independent permutations is 4-restricted min-wise 
independent, our family F C Sn of 4-restricted min-wise independent 
permutations is the witness that properly separates the notion of 4-rank- 
wise independence and that of 4-restricted min-wise independence. 



1 Introduction 

1.1 Definitions and Known Results 

The notion of fc-restricted min-wise independence was introduced by Broder, et 
al [3] to estimate the resemblance between two documents [2] for detecting almost 
identical documents on the Web (and a similar notion was implicitly used by Mul- 
muley to reduce the amount of randomness used by algorithms [9,4]). In fact, 
Broder, et al [3] showed that a family F C of permutations precisely estimates 
the resemblance between two documents of size not greater than fc > 1 iff it is 

S. Arora et al. (Eds.): APPROX 2003-1- RANDOM 2003, LNCS 2764, pp. 396-408, 2003. 
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fc-restricted min-wise independent. For any pair of integers i < j, let [i-,j] = 
+ . . . ,j}. We use S'„ to denote the set of all permutations on [0, n — 1] and 

use |A| to denote the cardinality of a finite set A. Broder, et al [3] defined a 
notion of fc-restricted min-wise independent permutations as follows: 

Definition 1 ([3]). A family IF C Sn of permutations is said to be fc-restricted 
min-wise independent if for any subset X C [0, n— 1] with \X\ < k and any x & X, 
Pr{min{7r(X)} = 7r(a;)} = 1/ \X\, when tt G J- is chosen uniformly at random. 

Itoh, Takei, and Tarui [6] showed how to construct a family iF C Sn of fc- 
restricted min-wise independent permutations such that \Tn\ < (2n)^lcm(fe,fc — 
1, . . . , 1). For the case that a biased (rather than the uniform) sampling of permu- 
tations is allowed, Broder, et al [3] showed that there exists a family T ^ SnOi fc- 
restricted min-wise independent permutations such that \Tn\ < j (and 

this is improved to 1 + ~ Matousek and Stojakovic [10]). 

For any X C [0, n— 1] and any x G X, let LT(a:, X) = {y G X : y < x} and de- 
fine the rank of x in X by RANK{x, X} = |LT(x, X)|. Itoh, Takei, and Tarui [6] 
defined the following notion stronger than fc-restricted min-wise independence. 

Definition 2 ([6]). A family J- G_ Sn of permutations is said to be fc-rankwise 
independent if for any subset X = {xi, X 2 , . . . , x^} C [0, n— 1] and any k distinct 
values ri,r 2 , . . . , Xfc e [0, fc— 1], Pr[/\^^j^ RANK{7r(xi), 7t(X)} = = 1/fc!, when 

'K G T is chosen uniformly at random. 

Itoh, Takei, and Tarui [6] showed how to construct a family iF Q Sn of fc-rankwise 
independent permutations such that \fFn\ < if (fc — 1)! < n. 

For any pair of integers n> d>0, define m{n, d) to be m{n, d) = (P 

d is even; m(n, d) = (j) + ((d-~i)/ 2 ) ^ lower bounds of 

the family size of fc-restricted min-wise and fc-rankwise independent permuta- 
tions, we have the following results (and for the related works, see [3,6,11,10]). 



Theorem 1 ([7]). For any pair of integers n > k > 1, if a family fF Sn of 
permutations is k-restricted min-wise independent, then \T\ > m(n — 1, fc — 1). 

Theorem 2 ([7]). For any pair of integers n > k > 1, let s = [fc/2j . // a family 
fF G_ Sn of permutations is k-rankwise independent, then \T\ > m{n — l,fc — 1) 
if k \T\ > max{m(n — 1, fe — 1), [n/sj if k>A. 

1.2 Main Results 

In this paper, we construct a family X C of 3-restricted and 4-restricted min- 
wise independent permutations by applying the affine plane AG(2,g). 

Theorem 3. For any integer n > 3, there exists a 3-restricted min-wise inde- 
pendent permutation family Tn C Sn such that -\- o(l)) • nlg^ n. 
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Theorem 4. For any integer n > 4, there exists a 4.-restricted min-wise inde- 
pendent permutation family Tn C S'„ such that \iFn\ < 12^/e{l + o(l)) • nlg^ n. 

So from Theorem 1, it follows that if a family IT C of permutations is 3-re- 
stricted or 4-restricted min-wise independent, then \T\ = F2{n). Thus the upper 
bounds of Theorems 3 and 4 are within poly(lg n) factor to the lower bound of 
Theorem 1. We also construct a family C of 4-rankwise independent per- 
mutations by applying the projective plane PG( 2 , 5 ). 

Theorem 5. For any integer n > 4, there exists a 4-rankwise independent per- 
mutation family Tn C Sn such that \Tn\ < 15e(l -I- o(l)) • n^lg^n. 

From Theorem 2, it follows that if a family IF C S'„ of permutations is 4-rankwise 
independent, then \T\ = Thus the result of Theorem 5 is close to that of 

Theorem 2. Note that for any family T Q Sn oi permutations, if it is fe-rankwise 
independent, then it is fc-restricted min-wise independent. So it follows from 
Theorem 2 that our family T Sn oi 4-restricted min-wise independent permu- 
tations given in Theorem 4 is the witness that properly separates the notion of 
4-rankwise independence and that of 4-restricted min-wise independence. 



2 Preliminaries 



We use RV„ G Sn to denote a reverse permutation, i.e., for each x G [0,n — 1], 
RV„(cc) = n—l — x. For each n G Sn, let RV^ ott = tt and RV;'j 0 7r = RV„ ott. Let 
m > n > 1 be integers and tt G Sm- Define if : [0, n — 1] — > [0, n — 1] such that 
for each x G [0,n— 1], 7t(x) = RANK{7r(a;), [0,m — 1] — 7r([n, m — 1])}. Note that 
TT G Sn and we use Tm,n to denote the transform Sm ^ tt ^ tt G Sn- 

Proposition 1 ([6]). For any integer m > 0, let T G1 Sm be a family of k- 
restricted min-wise {resp. k-rankwise) independent permutations. For any integer 
n < m, let Q = {tt : TT = Tm,n'^,TT G T}. Then the family G C Sn of permutations 
is k-restricted min-wise {resp. k-rankwise) independent and |^| = \T\. 

Our constructions of permutation families are recursive, and the rest of this 
section is concerned with technicalities necessary for the analysis of the recursion. 
So the discussion below is not needed to understand the constructions and the 
readers may skip this part. Let g : Z — > Z be a (partial) function such that for any 
integer q' = 2*, g{q') = ^/(f if t is even; g{q') = \/2ff otherwise. For any integer 
g = 2^* > 4, we define a sequence of integers qi = q,qi-i = g{qe), . . . ,qi = 
g{q 2 ) =4 = 2^. Note that (. — 1 < 1 -I- Ig t = Ig Ig g. For each i G [1, £ — 1] , we have 
two cases: (i) g^+i = (ii) 2g^+i = gf . Notice that for any integer g = 2^ >4, 
the case (ii) never occurs. In general, we have that for any integer g = 2^* > 4, 
the case (ii) occurs at most Igt = (Iglgg) — 1 times. Thus 



£-1 



l[q^ 



£-1 2 

. rr jL 

gi l\q^+l 



^ q 
4 

< {L . 2^8 *8 9 “ 1 



if g = 2^'* > 4; 
|-lgg ifg = 22‘>4. 



( 1 ) 
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From the definition of the sequence we have that qi > 2®+^. Thus 



t-i 



n 




i=l 



1 + 



2^+l 



“ 1 ^ 




( 2 ) 



3 3-Restricted Min- Wise Independent Permutations 

For a family C of 3-restricted min- wise independent permutations, it fol- 
lows from Theorem 1 that \!F\ > m{n — 1, 3 — 1) = n. To show Theorem 3, we 
construct a family C Sn of 3-restricted min- wise independent permutations 
such that |1F„| = which is close to the result of Theorem 1. 



3.1 AfRne Planes 

Let Fq be a field of q elements, where g is a prime power. An affine plane AG(2, q) 
is a 2-dimensional vector space over consisting of q^ points and q'^ + q lines [8]. 
In AG(2, q), there exist g-|-l parallel line classes {Co, Ci, . . . , Cq} = Cg. Note that 
each Ci G Cq contains q parallel lines, i.e., Ci = |Lq, L\, . . . , L^g_i} and each T® G 
Ci has q points, i.e., L® = simplicity, we identify F, 

with [0, q — 1]. Arrange q^ points in AG(2, q) in a natural manner. For each par- 
allel line class Ci &Cq, let /{,, /p : AG (2, q) [0, g — 1] be functions such that for 
each X G AG(2, q), fl(x) = u and fp{x) = v if x = i.e., there exists a line 

Ly G Ci in which x = p^ y G L^. For each parallel line class Ci G Cq, we have 
that for any pair ofx,y€ AG{2,q), x ^ y iff (/i(a;), /®(a;)) ^ (/i(y), /p (?/))• 

3.2 Recursive Construction: 3-Restricted Min-Wise Independence 

For any f > 2, let g = 2*. Assume that there exists a family Qq C Sq of 3-restricted 
min-wise independent permutations. Our construction of 3-restricted min-wise 
independent permutations can be viewed as follows: For each parallel line class 
Ci ^ Cq, q lines L® G Ci are permuted in 3-restricted min-wise independent man- 
ner; q points p® ^ G T® are permuted in 3-restricted min-wise independent manner 
and those permuted q points in T® are reversed. More formally, 

(1) For each parallel line class Ci G Cq, each tt G Qq, and each X G {0,1}, 

define a : [0, — 1] ^ [0, g^ — 1] such that for each x G [0, g^ — 1], a(x) = 

+ RVf o 7 t(/^(x)). 

(2) Let Qq 2 = {(T : i G [0,g],7r G Gq, X G {0,1}}. 

It is not difficult to see that Qq 2 is a permutation family, i.e., Qq 2 C Sq 2 . To show 
Theorem 3, the following lemma is applied recursively. 

Lemma 1. For any prime power q, if a family Qq C Sq of permutations is 3- 
restricted min-wise independent, then the family Qq 2 C Sq 2 of permutations is 3- 
restricted min-wise independent. 
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3.3 Proof of Lemma 1 

Consider the following cases: For any distinct points x\,X 2 ^x^ € [0,g^ — 1], (i) 
there exists the unique parallel line class Ca € Cq and the unique line € Ca 
such that cci , X 2 , CC 3 G L^, i.e., xi, X 2 , X 3 are colinear; (ii) for any parallel line class 
Ca G Cq, there exists no line G Ca such that xi,X 2 ,xs G L‘^, i.e., X\,X 2 , X 3 are 
in general position. For each 1 < ft, < 3, let Ah = {a G Qq^ : min{cr(xi, X 2 , X 3 )} = 
a{xh)} be the event when a G Gg 2 is chosen uniformly at random. For the proof 
of Lemma 1, it suffices to show the following claims. 

Claim 31. For the case (i), Pr[Ai] = Pr[A 2 ] = Pr[^ 3 ] = 1/3. 

Proof: We have the following events: For each i G [ 0,5 — 1], (i-1) i = a; (i- 
2) i ^ a (see Figure 1). For the event (i-1), there exists a unique line G Ca such 
that xi,X 2 ,X 3 G Lj . Since a family Gq Q Sq of fc-restricted min- wise independent 
permutations is fc-restricted max-wise independent [5, Theorem 2], any of the 
points xi,X 2 ,X 3 G [0, — 1] can be the minimum with probability 1/3 by tt G Gg 

and RVq o tt. So it follows that any of the points xi,X 2 ,xs G [0, — 1] can be the 

minimum with probability 1/3 by ct G Gq^- So for each 1 < ft < 3, Pr[yl^ A 
event (i-1)] = For the event (i-2), we have that x\ G Lj^, X 2 G and 

X 3 G where ji, j 2 ,l 3 G [0, g- 1] are distinct. Since fl(xi) = ji, fl(x 2 ) = J 2 , 
and fj^ixs) = J 3 , any of ji, j 2 , J 3 G [0, g— 1] can be the minimum with probability 
1/3 by 7 T G Gq- Then any of the points xi,X 2 ,xs G [0, — 1] can be the minimum 

with probability 1/3 by cr G Gg^- So for each 1 < ft < 3, Fr[Ah A event (i-2)] = 

• j. Thus we have that Pr[Ai] = Pr[yl 2 ] = Pr[A 3 ] = 1/3. I 

Claim 32. For the case (ii), Pr[Ai] = P 1 /A 2 ] = Pr[A 3 ] = 1/3. 

Proof: We have the following events: For each i G [0,g — 1], (ii-1) xi,X 2 ,xz G 
[0, g^ — 1] are on different three lines in Cg (ii-2) only two of xi,X 2 ,xs G [0, g^ — 1] 
are on the same line (see Figure 2). For the event (ii-1), we can show that any of 
the points xi,X 2 ,xs G [0, g^ — 1] can be the minimum with probability 1/3 by ct G 
Gg 2 in a way similar to the event (i-2) of Claim 3.1. So we have that for each 1 < 




(i-1) 



(i-2) 



Fig. 1. Events for the Case (i) — 3-Restricted Min-Wise Independence 




(ii-1) 




(ii-2) 



Fig. 2. Events for the Case (ii) 
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h < 3, Pr[Aft,Aevent (ii-1)] = For the event (ii-2), consider the following 

subevents: For distinct G [0,g], (ii-2.1) xi ^ and X 2 ,xs G (ii- 

2.2) X2 ^ m and xi,X3 G (ii-2. 3) X3 ^ and xi,X2 G For the 

subevent (ii-2.1), the point Xi can be the minimum with probability 1/2. For the 
subevents (ii-2. 2) and (ii-2. 3), the point xi can be the minimum with probability 
(1/2)^ = 1/4. So we have that Pr[AiAsubevent (ii-2)] = + In 

a way similar to the argument above, we can show that Pr [A2A subevent (ii-2)] = 
Pr[A3 A subevent (ii-2)] = Thus Pr[Ai] = Pr[A2] = Pr[A3] = 1/3. I 



3.4 Proof of Theorem 3 



For any integer g = 2^* > 4, define a sequence {gi}ig[i/] of integers by the func- 
tion g defined in Section 2. As in Subsection 3.2, construct a family Q ^2 ^ C 8^2 ^ 
of 3-restricted min- wise independent permutations from the family Gq^_^ C Sq^_^ 
of 3-restricted min-wise independent permutations. Note that for each i G [2,f], 
qi < qf_i. By Proposition 1, transform the family Q ^2 ^ C Sq 2 ^ of 3-restricted 
min-wise independent permutations to a family Gq. C Sq^ of 3-restricted min- 
wise independent permutations. We can start with any family Gqi Q Sq-^ = S 4 
of 3-restricted min-wise independent permutations. Then we take Gqi = S 4 , i.e., 
\Gqi\ = |*S'4| = 4! = 24. Recall that i — l< Iglgg. So we have that 



t-i 



\Gq, 1 = 2 (g,_i + 1) • \Gq,_, I = 2^-1 n (9^ + 1) • 1^91 I 



(i-l 



(-1 



<24.1gg.mg, .m 1-L- 



. 2 = 1 



Qi 



( 3 ) 



For any integer n > 4, let g = 2^* be the minimum integer such that n < q. 
Note that g < 4n. So from Ineq.(3), Eqs.(l) and (2), and Proposition 1, it follows 
that for any integer n > 4, there exists a family Q Sn of 3-restricted min-wise 
independent permutations such that 

\^n\ = \Gq\ < 3a/c • glg^ g < 12i/e • n(2 -|- Ign)^ = 12 ^/e(l + o(l)) • nlg^ n. 

In particular, we have the following corollary from Ineq.(3) and Eqs.(l) and (2). 



Corollary 1. For any integer n = 2^ >4, there exists a family Tn C S'„ of 3- 
restricted min-wise independent permutations such that \Tn\ < 6-y/e • nlgn. 



4 4-Restricted Min- Wise Independent Permutations 

For a family C S'„ of 4-restricted min-wise independent permutations, it fol- 
lows from Theorem 1 that \T\ > m{n — 1, 4 — 1) = 2n — 2. To show Theorem 4, 
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we construct a family Tn C Sn of 4-restricted min-wise independent permuta- 
tions such that |iF„| = which is close to the result of Theorem 1. For the 

case that fc = 4, Theorem 2 implies that if a family C of permutations is 
4-rankwise independent, then \J^\ = Thus our family Tn C Sn of permu- 

tations given in Theorem 4 shows that the notion of 4-rankwise independence is 
strictly stronger than that of 4-restricted min-wise independence. 

4.1 Recursive Construction: 4-Restricted Min-Wise Independence 

For any t > 2, let g = 2*. Assume that there exists a family Qg C Sq of 4-restricted 
min-wise independent permutations. Our construction of 4-restricted min-wise 
independent permutations is similar to that of 3-restricted min-wise independent 
permutations in Subsection 3.2 and can be viewed as follows: For each parallel 
line class Ci G Cq, q lines L* € Ci are permuted in 4-restricted min-wise indepen- 
dent manner and those permuted q lines in Ci are reversed; q points Pj ^ S Lj 
are permuted in 4-restricted min-wise independent manner and those permuted 
q points in L* ^ are reversed. More formally, 

(1) For each parallel line class Ci G Cq, each tt G Gq, and each X,Y G {0,1}, 

define a : [0, — 1] — > JO, g^ — 1] such that for each x G [0, g^ — 1], cr(a:) = 

RVf o Ti{fl{x))q + RVg o 7r(/^(a:)). 

(2) Let Gq^ = {a-.iG [0, g], ^ € 0,, A € (0, 1}, F € (0, 1}}. 

It is not difficult to see that Gq"^ is a permutation family, i.e., Gq'^ Q Sq2. To show 
Theorem 4, the following lemma is applied recursively. 

Lemma 2. For any prime power q, if a family Gq C Sq of permutations is 4~ 
restricted min-wise independent, then the family Gq'^ C Sq2 of permutations is 4~ 
restricted min-wise independent. 

4.2 Proof of Lemma 2 

Consider the following cases: For any distinct points a: i, X2, 0:3, CC4 G [ 0 ,g^ — 1 ], (i) 
there exists a unique line G Ca such that xi, X 2 , X 3 , X 4 G L^, i.e., xi, X 2 , X 3 , X 4 
are colinear; (ii) there exists a unique line G Ca including only three points 
Xhi,Xh2,Xh3 G {xi,X 2 ,X 3 ,X 4 }, i.e., Xft,j,Xft, 2 ,x/j 3 are colinear; (iii) for any parallel 
line class Ca G Cq, there exists no line G Ca such that for some three points 
Xhi,Xh2,Xh3 G |xi,X2,X3,X4}, Xh3,Xh2,Xh3 G Lp. For any 1 < h < 4 , let Bh = 
{cr G Gq'^ ■ min{cr(xi, X 2 , X 3 , X 4 )} = a{xh)} Q Gq'^ be the event when tr G Gq'^ is 
chosen uniformly at random. To show Lemma 2, the following claims suffice. 

Claim 41. For the case (i), Pr[Ri] = Pr[i? 2 ] = Pr[S 3 ] = Pr[i? 4 ] = 1/4. 

Proof: We have the following events: For each i G [0,g — 1], (i-1) i = a; (i-2) 
i ^ a (see Figure 3). In a way similar to the proof of Claim 3.1, it is immediate 
that for each 1 < h < 4, Pr[R/i Aevent (i-1)] = \ and Pr[R/iAevent (i-2)j = 

^ • i = ^ • 3 - Thus Pr[i?i] = Pr[i? 2 ] = Pr[i? 3 ] = Pr[i? 4 ] = 1/4. ■ 
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(i-1) 




Fig. 3. Events for the Case (i) — 4-Restricted Min-Wise Independence 




(ii-2.{l,2,3)) 




(ii-2.4) 



Fig. 4. Subevents of (ii-2) — 4-Restricted Min- Wise Independence 



Claim 42. For the case (ii), Pr[i?i] = Pr[i? 2 ] = Prl^s] = Pi'[S 4 ] = 1/4. 

Proof: Without loss of generality, assume that there exists a unique line G Ca 
such that xi ^ L’^ and X 2 ,x^^Xi G (the other cases can be handled analo- 
gously). For each Ci G Cq, we have the following events: (ii-1) i = a; (ii-2) i ^ a. 
For the event (ii-1), it is immediate that Pr[RiAevent (ii-1)] = and for each 

2 < h < 4, Pr[Bft, Aevent (ii-1)] = ■ g. For the event (ii-2), we have three lines 

G G G Ci^ such that xi,X 2 G L]\., xi,xsG xi,X 4 e 

For each Ci GCq, we have the following subevents: (ii-2.1) i = i 2 ', (ii-2. 2) i = is] 
(ii-2. 3) i = ii] (ii-2.4) i ^ {i 2 , is, ii} (see Figure 4), each of which occurs with the 
probability shown in Table 1. So Pr[Ri] = Pr[i? 2 ] = Prl^s] = Pr[.B 4 ] = 1/4. I 

Claim 43. For the case (iii), Pr[Ri] = Pr[S 2 ] = Pr[.B 3 ] = Pr[i? 4 ] = 1/4. 

Proof: Without loss of generality, consider the following subcases: (iii-1) there 
exist the two unique parallel line classes Ca,Ct G Cq for which there exist a pair 
of lines ^ such that X\,X 2 G and xs,Xi G and a pair of lines 

Lhi’L }^2 ^ such that xi,a :4 G and X 2 ,a :3 G (iii-2) there exists only a 
unique parallel line class Ca G Cq for which there exists a pair of lines G 

Ca such that xi,a :2 G L°j^ and X 3 ,X 4 G (iii-3) there exists no parallel line 
class Ca G Cq such that there exists a pair of lines G Ca , each of which in- 

cludes two points of xi,X 2 ,X 3 ,X 4 . For the subcase (iii-1), consider the following 
events: For each Ci G Cq, (iii-1. 1) i G {a,b}; (iii-1. 2) i G [0,g] — {a,b} such that 
there exists L} G Ci including the points xi,X 3 ; (iii-1. 3) i G [0,q] — {a,b} such 
that there exists L} G Ci including the points X 2 ,X 4 ; (iii-1. 4) i G [0,g] — {a,b} 
such that there exists L} G Ci including any of two points of xi,X 2 ,X 3 ,X 4 (see 
Figure 5), where each of the subevents occurs with the probability as shown in 
Table 2. Thus for the subcase (iii-1), Pr[i?i] = Pr[i? 2 ] = Pi'i^s] = Pr[i? 4 ] = 1/4. 
For the subcases (iii-2) and (iii-3), we can show the claim analogously. I 



4.3 Proof of Theorem 4 

We can show Theorem 4 in a way similar to the proof of Theorem 3. 
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Table 1. Probability of Subevents — The Case (ii) 





Bi 


B2 


Bi 


Bi 


subevent (ii-2.1) 


1 1 
g+1 ■ 6 


1 1 
g+1 ■ 6 


1 1 
g+1 ■ 3 


1 1 
g+1 ' 3 


subevent (ii-2.2) 


1 1 

g+1 ■ 6 


1 1 
g+1 ■ 3 


1 1 
g+1 ■ 6 


1 1 
g+1 ' 3 


subevent (ii-2.3) 


1 1 
g+1 ■ 6 


1 1 
g+1 ■ 3 


1 1 
g+1 ■ 3 


1 1 
g+1 ' 6 


subevent (ii-2.4) 


g— 3 1 

q+1 ■ 4 


g— 3 1 

g+1 ■ 4 


g— 3 1 

g+1 ■ 4 


g— 3 1 

g+1 ■ 4 






Fig. 5. Events for the Case (iii-1) — 4-Restricted Min- Wise Independence 



For any integer q = 2 '^^ > 4, we define a sequence {9i}zG[i,q of integers by the 
function g in Section 2. We start with a family Qg^ = Sg-^ = S4,i.e.,\Gg,\ = \S4\ = 
4! = 24, and construct a family Gg C Sg of 4-restricted min- wise independent 
permutations as shown in Subsection 3.4. Recall that 1 — 1 < Iglgg. So 







For any integer n > 4, let g = 2^* be the minimum integer such that n < q. 
Note that q < 4n. So from Ineq.(4), Eqs.(l) and (2), and Proposition 1, it follows 
that for any integer n > 4, there exists a family C of 4-restricted min-wise 
independent permutations such that 

\^n\ = \Gq\ < 3a/c • qlg^ q< 12y/e- n{2 + Ign)^ = 12^/e(l + o(l)) • nlg^ n. 

In particular, we have the following corollary from Ineq.(4) and Eqs.(l) and (2). 



Corollary 2. For any integer n = 2^ >4, there exists a family Tn C Sn of 4~ 
restricted min-wise independent permutations such that |iF„| < 6-y/e- nlg^n. 
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Table 2. Probability of Subevents — The Subcase (iii-1) 





Bi 


B2 


B3 


B4 


subevent (iii-1.1) 


2 1 

q + 1 ' 4 


2 1 
9+1 ' 4 


2 1 
9+1 ' 4 


2 1 
9+1 ' 4 


sub event (iii-1. 2) 


1 1 

9 + 1 ■ 6 


1 1 
9+1 ' 3 


1 1 
9+1 ’ 3 


1 1 
9+1 ’ 6 


sub event (iii-1. 3) 


1 1 

9 + 1 ■ 3 


1 1 
9+1 ' 6 


1 1 
9+1 ' 6 


1 1 
9+1 ' 3 


sub event (iii-1. 4) 


9—3 1 

9 + 1 ■ 4 


9 — 3 1 

9 + 1 ■ 4 


9 — 3 1 

9+1 ' 4 


9 — 3 1 

9+1 ' 4 



5 4-Rankwise Independent Permntations 

For a family iP C of 4-rankwise independent permutations, it follows from 
Theorem 2 that \T\ > \ji/2\ To show Theorem 5, we con- 

struct a 4-rankwise independent permutation family Tn C Sn such that \Tn\ = 
n3+°(i), which is close to the result of Theorem 2. 

5.1 Projective Planes 

Let g be a prime power. A projective plane PG(2, q) of order q consists of q^+q+1 
points and has the following properties [8] : 

Property 1. A projective plane PG(2,g) of order q satisfies (PI) every line has 
q+1 points; (P2) any two points lie on a unique line; (P3) any point lies on g-|- 1 
lines; (P4) there are q^ + q+1 lines; (P5) any two lines meet in a unique point. 

From P3 of Property 1, it follows that for each point s € PG(2, g), there exists 
a set L® = {£q,£i, ■ ■ ■ ,£g} of g -I- 1 lines, each of which intersects the point s. 
From PI of Property 1, we have that each line £f € L‘^ consists of g -I- 1 points, 
i.e., = {pf n, rfn , . . . 1 , s}. Arrange g^ -kg -1-1 points of PG(2,g) naturally. 

For each s £ PG(2, g), let f[ ■ PG(2, g) - {s} ^ [0, g] and : PG(2, g) - {s} ^ 
[0, g — 1] be functions such that for any point x G PG(2, g) — {s}, f£(x) = i and 
f£{x) = j \i X = pip i.e., there exists a line G on which x = pf ^ G £\. For 
each point s G PG(2,g), note that for any pair of points x,y G PG(2,g) — {s}, 
xy£yiS {fl{x)J^{x)) yf {fiiy)Jp{y))- 

5.2 3-Wise Independent 0/1-Random Variables 

Alon, Babai, and Itai [1] showed the following result on the construction of fc-wise 
independent random variables with a small sample space. 

Proposition 2 ([!]). Let m = 2^ — 1 and k = 2t + 1 < m be integers. Then 
there exist k-wise independent random variables Aq, Ai, . . . , Am_i : 42 — > {0, 1} 
for which the distribution on the sample space 42 is uniform; |42| = 2(m -I- 1)*; 
Pr[Ao = 1] = Pr[Ai = 1] = • • • = Pr[A,„_i = 1] = 1/2. 
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To construct a small family of 4-rankwise independent permutations, we apply 
the following proposition as a special case of Proposition 2, i.e., 

Proposition 3. For any integer n > 3, let m = 2^ — 1 > n. Then there exist 

3- wise independent random variables Xq,Xi, . . . ,X„_i : —>■ {0, 1} for which 

the distribution on the sample space fi is uniform; \Q\ = 2(m+l); Pr[Xo = 1] = 
Pr[Xi = 1] = . . . = Pr[X„_i = 1] = 1/2. 

5.3 Construction of 4-Rankwise Independent Permutations 

For any t>2,\etq = 2* and m = 2*+^ — 1. For convenience, we identify PG(2, q) 
with [0, q^ q]. Note that for any t>2, q-\-2<m and m + 1 = 2g. It fol- 
lows from Proposition 3 that there exist 3-wise independent random variables 
Xq, Xi, . . . , Xq+i : n — > {0, 1} for which the distribution on the sample space 
is uniform; |f?| = 2(m -|- 1) = Aq; Pr[Xo = 1] = Pr[Xi = 1] = • • • = Pr[Xg+i = 
1] = 1 /2. Assume that there exists a (small) family Gq+i C Sg+i of 4-rankwise in- 
dependent permutations. Informally, our construction of 4-rankwise independent 
permutations can be viewed as follows: Choose a point s € PG(2, q) uniformly 
at random; map the point s € PG(2,q') to the minimum or maximum among 
PG(2,q) with probability 1/2; g -I- 1 lines G are permuted in 4-rankwise 
independent manner; q points ^ on each — { s} are permuted in 4-rankwise in- 
dependent manner; the line permutation and the point permutation are reversed 
in 3- wise independent manner. More formally, 

(1) For each s G PG(2, q), Xq, Xi, . . . , : I? ^ {0, 1}, each X G {0, 1}, and 

each 7T G Gq+i, define a : PG(2,q) PG(2,q) such that cr(s) = (q^ -h q)X 

and for each x G PG(2, q) — {s}, 

a{x) = {RVf_/r o niflix))] q + o + 1 - A. 

(2) Let = {a : s € PG(2,g),A G {0,1},^ G 0,+i, G 

It is not difficult to see that Gq^+q+i is a family of permutations, i.e., Gq^+q+i Q 
Sq 2 _^_q^i. To show Theorem 5, the following lemma is applied recursively. 

Lemma 3. For any prime power q, if a family Gq+i Q S'q+i of permutations is 

4 - rankwise independent, then the family Gq^+q+i C of permutations is 

4 -rankwise independent. 

Proof (Sketch): For any distinct x\,X 2 ,xq,X 4 . G PG(2,g), we consider the fol- 
lowing cases: (i) there exists a unique line i{x\,X 2 , Xq, X 4 ) including x\,X 2 ,xq, x^; 

(ii) there exists a unique line £{xi,Xj,Xk) including Xi,Xj,Xk C {x\,X 2 ,xq,X 4 }; 

(iii) there exists no line £{xi,Xj,Xk) including Xi,Xj,Xk Q {x\,X 2 ,xq,X 4 }. Let 

E = {a G Gq-i+q+i ■ o'(a^i) < fx{x 2 ) < (t{xq) < cr(x 4 )} C Gq 2 +q+i be the event 
when a G Gq^+q+i is chosen uniformly at random. By Property 1, we can show 
that for each of the cases (i), (ii), and (iii), Pr[if] = 1/4! = 1/24. I 
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5.4 Proof of Theorem 5 



For any integer q = 2^* > 4, define a sequence of integers by the func- 

tion g given in Section 2. Assume that for any i € [2,^], there exists a family 
Qqi-i+i C Sq^_^+i of 4-rankwise independent permutations. As in Subsection 5.3, 
construct a family C S '^2 of 4-rankwise independent per- 
mutations from the family Gqi_i+i C of 4-rankwise independent permu- 

tations. Note that for each t € [2,£], g^-l-l < q^_T^ + qi-i + l. By Proposition 1, we 
transform C Sq^ to a family Gqt+i Q Sq.+i of 4-rankwise 

independent permutations. We can start with any family Gq^+i C 5,^+1 = S 5 of 
4-rankwise independent permutations. Then we take Gqi+i = S 5 , i.e., = 

IS'sl = 5! = 120. Recall that f — 1 < Iglgq. So we have that 



l^9?-i-i| — 8 {qe-i + qt-i + l) qi-i |l/ 9 f_i+i| < fn ^ 

V 9^-1/ 



^qt~l + l \ 






' 1-1 



<2«<->. n« ■ n i+v 



. i=l 
( l-l 



. 1=1 

(l-l 









<i2oig3g.mg, 'ill 1 



. 2=1 



. 2=1 



(li 



( 5 ) 



For any integer n > 4, let g = 2^* be the minimum integer such that n < q + 1. 
Note that q < 4n. So from Ineq.(5), Eqs.(l) and (2), and Proposition 1, it follows 
that for any integer n > 4, there exists a family Tn Q Sn of 4-rankwise indepen- 
dent permutations such that 

1 5p 

\^n\ = \Gq+i \ < -^ ■ q^lg^ q < 15e-n^(2-klgn)® = 15e(l + o(l)) -n^lg^n. 

In particular, we have the following corollary from Ineq.(5) and Eqs.(l) and (2). 



Corollary 3. For any integer n = 2^ >4, there exists a family Tn C Sn of 4~ 
rankwise independent permutations such that \Tn\ < (15e/8) • n^lg^n. 
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